Hello, I have made significant changes to one of my sites and, as the result, many pages have been removed. I would like to get them removed from Google index to avoid content duplication (the pages have not been moved so I can't use 301 redirect here). At the same time I want to get rid of some query strings because they are causing duplicate content as well. So the end result is I have 301 redirect in the htaccess file to get rid of query strings and I have 410 HTTP Gone status returned for the old, non-existent pages (Google supposedly chokes on 404). The question I have is: when google finds 301 redirect and the destination page returns 410, does it: 1) remove the old page from the index (which is what I want), or 2) declares that there is a problem with the redirect since the destination page returns 410 and keeps the old page in the index. I guess I could just see what happens but I'm afraid of causing long term problems with Google. Does anybody have any recommendations or thoughts about this? I think the safest route is to take the redirect out for now and when the old pages are removed from the index, add the redirect and get rid of the query string. Any input will be highly appreciated.
First of all Google does not "choke" on 404 Not Found status. They simply display them in Webmaster Tools so that you as the webmaster are aware that there is a link (either on your site or on another site) that points to a page on your site that does not exist. If it's an internal link that they are following when getting the 404, you can change it so that it points to a page that DOES exist instead. If it's external you can implement a 301 redirect to another page on your site. Google WANTS your site to throw a 404 (not a 410) when a page is requested and no longer exists AND there is no other page on your site with information similar or related to that of the missing page. Personally, I hate allowing 404 statuses to be returned. 301 Permanently Moved redirects are not ONLY for when you change the URL for a page. If you eliminate a page from your site, IMO you should ALWAYS 301 redirect that page to another page on your site whose content most closely resembles the content of the old page being redirected. So if you have a page about "winter car maintenance" that you eliminated but you have another page about "general car maintenance", it's OK to 301 redirect requests for the old "winter car maintenance" page to the "general car maintenance" page... Hell, if you don't have a "general car maintenance" page, it would be fine IMO to even redirect it to a page about "cars" in general. Purists will say that if you don't have another "winter car maintenance" page that you should simply 404. I disagree. Inbound links are too hard to come by. So as long as I can find a page even remotely related to the page that was eliminated, I will 301 redirect it to preserve the inbound link text and the link juice/page rank being passed in. An inbound link with link text "winter car maintenance" will STILL help a page that targets the more general keyword "car" or "cars" to rank for its targeted keyword(s). The ONLY time I would allow a page to 404 is possibly if my site was previously about "cars" but now it is about something totally unrelated like "doughnuts"! LOL In other words I would only return a 404 if there is no page on the site that is even remotely related to the topic of the old page. If you are NOT going to 301 redirect a page and do want it to 404 then I would highly suggest creating a custom 404 page that clearly tells the user that the page they requested no longer exists. Include links to all of the important pages of your site on this custom 404 page. Typically you would include the same global navigation and footers that the rest of the site has with possibly a mini sitemap to several important inner pages as well in hopes that the user will click around your site instead of hitting the back button to return to the search engine. Be SURE this page returns a 404... NOT a 200.
Thanks for your reply, Canonical. I think I will follow your advice and redirect non-existing pages. As a sidenote, can you elaborate on this point: Is this documented anywhere by Google? From what I read it's better to throw 410 cause that will result in a quick removal from the index. When you throw 404, the googlebot will attempt to read this page again and again and it will take a long time to remove the page from the index (this I know from experience). Here's what I found on Google Webmaster forums: I think Google's interpretation is correct. If you read RFC 2616, 404 status code is a transient error that means that the server has temporary problems finding the resource. 410 means that the resource no longer exists.