Credit Report - Online Loans - Northern Rock - Mortgage - Refinance

PDA

View Full Version : NOINDEX valid in robots.txt?


aardvark
Sep 21st 2005, 12:49 pm
Hello,

This is my first post in the forum and so far I'm liking what I see. There seems to be a lot of knowledgeable people here!

I have a bunch of pages from the old version of my website that are still in Google's cache even after about 2.5 months of Googlebot receiving 404's for these pages.

I understand the logic behind the Disallow command and I use it in my current robots.txt file, but what I want to happen is for the pages that are no longer valid to be removed from the index as quickly as possible.

I understand that the NOINDEX meta tag will work for pages that don't return a 404 but what about pages that aren't there anymore?

The pages that are no longer available do not have any counterparts. That is, there aren't any pages that replaced them otherwise I'd use a 301 redirect in my .htaccess file.

I was hoping that NOINDEX: /printable.asp would work but the robots.txt validator says that is no good.

Is my only option to wait it out?

Also I should mention that printable.asp is a dynamic page so it utilizes the querystring.


Thanks!
Chris

aeiouy
Sep 21st 2005, 12:56 pm
I don't think you can use robots.txt to accomplish that.

Google does have a URL removal tool that you can use to submit requests to remove pages from the index.

But looking at that, not sure you can use that for pages that no longer exist. Seems like if you want pages that no longer exist to go away you just have to wait it out.

Perhaps someone else has better information on the removal tool in terms of what you can and can not do with it.

aardvark
Sep 23rd 2005, 3:07 pm
That's what I was afraid of. I've used that tool in the past (as an experiment) to remove three pages and it worked. But it took about a week and I had to add each URL one at a time. I've got hundreds to add.

I guess I'll just keep waiting. Thanks.

ResaleBroker
Sep 23rd 2005, 3:11 pm
Use Google's automatic URL removal system. It doesn't matter if the page no longer exists.

johnt
Sep 24th 2005, 4:37 am
The url removal tool is only a temporary fix. After 180 days the pages will reappear in the index, regardless of what is set in robots.txt.
You could try returning a 410 ( gone ) status code for the missing pages, that may tell Google to remove them from their index, although I must confess that after many attempts, with many different methods, to get pages removed from their index permanently I have yet to see any success.

Has anyone else here managed to do this ?

John

ResaleBroker
Sep 24th 2005, 6:17 am
The url removal tool is only a temporary fix. After 180 days the pages will reappear in the index, regardless of what is set in robots.txt.Where does this information come from?

johnt
Sep 24th 2005, 6:31 am
From Google's removal tool page
Please keep in mind that submitting via the automatic URL removal system will cause a temporary, six months, removal of your site from the Google index.
it actually says 6 months, it was 180 days last time I used it.

I added the "regardless of what is set in robots.txt" bit based on personal experience. I emailed them asking why the files were allowed back into the index even if robots.txt still banned them after the temporary period elapsed, but just got a standard "thanks for your comments" response.

webmistress
Sep 24th 2005, 6:42 am
Hello,

This is my first post in the forum and so far I'm liking what I see. There seems to be a lot of knowledgeable people here!

I have a bunch of pages from the old version of my website that are still in Google's cache even after about 2.5 months of Googlebot receiving 404's for these pages.

I understand the logic behind the Disallow command and I use it in my current robots.txt file, but what I want to happen is for the pages that are no longer valid to be removed from the index as quickly as possible.

I understand that the NOINDEX meta tag will work for pages that don't return a 404 but what about pages that aren't there anymore?

The pages that are no longer available do not have any counterparts. That is, there aren't any pages that replaced them otherwise I'd use a 301 redirect in my .htaccess file.

I was hoping that NOINDEX: /printable.asp would work but the robots.txt validator says that is no good.

Is my only option to wait it out?

Also I should mention that printable.asp is a dynamic page so it utilizes the querystring.


Thanks!
Chris

Chris, those people have given you enough information for you to start getting good rankings from google in 2007 or 2008 perhaps, if your lucky.

Your problem is a mild one and you should in no way try to tamper with the google database, even if it's a tool provided by google. the use of this tool is not relevant to your current problem. You say that you have a page in their cache for 2.5 months now. Pages that normally render a 404 for too long are finally devaluated after a certain amount of time, 6 months or so. Googlebot is now smart enough to see that this page is stale and will diminish the visit of it after the timeframe i just mentioned. Absolutely don't use the disallow exclusion rule for googlebot. This could jeorpardized your whole site.