1. Advertising
    y u no do it?

    Advertising (learn more)

    Advertise virtually anything here, with CPM banner ads, CPM email ads and CPC contextual links. You can target relevant areas of the site and show ads based on geographical location of the user if you wish.

    Starts at just $1 per CPM or $0.10 per CPC.

NOINDEX valid in robots.txt?

Discussion in 'robots.txt' started by aardvark, Sep 21, 2005.

  1. #1
    Hello,

    This is my first post in the forum and so far I'm liking what I see. There seems to be a lot of knowledgeable people here!

    I have a bunch of pages from the old version of my website that are still in Google's cache even after about 2.5 months of Googlebot receiving 404's for these pages.

    I understand the logic behind the Disallow command and I use it in my current robots.txt file, but what I want to happen is for the pages that are no longer valid to be removed from the index as quickly as possible.

    I understand that the NOINDEX meta tag will work for pages that don't return a 404 but what about pages that aren't there anymore?

    The pages that are no longer available do not have any counterparts. That is, there aren't any pages that replaced them otherwise I'd use a 301 redirect in my .htaccess file.

    I was hoping that NOINDEX: /printable.asp would work but the robots.txt validator says that is no good.

    Is my only option to wait it out?

    Also I should mention that printable.asp is a dynamic page so it utilizes the querystring.


    Thanks!
    Chris
     
    aardvark, Sep 21, 2005 IP
  2. aeiouy

    aeiouy Peon

    Messages:
    2,876
    Likes Received:
    275
    Best Answers:
    0
    Trophy Points:
    0
    #2
    I don't think you can use robots.txt to accomplish that.

    Google does have a URL removal tool that you can use to submit requests to remove pages from the index.

    But looking at that, not sure you can use that for pages that no longer exist. Seems like if you want pages that no longer exist to go away you just have to wait it out.

    Perhaps someone else has better information on the removal tool in terms of what you can and can not do with it.
     
    aeiouy, Sep 21, 2005 IP
  3. aardvark

    aardvark Peon

    Messages:
    2
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #3
    That's what I was afraid of. I've used that tool in the past (as an experiment) to remove three pages and it worked. But it took about a week and I had to add each URL one at a time. I've got hundreds to add.

    I guess I'll just keep waiting. Thanks.
     
    aardvark, Sep 23, 2005 IP
  4. ResaleBroker

    ResaleBroker Active Member

    Messages:
    1,665
    Likes Received:
    50
    Best Answers:
    0
    Trophy Points:
    90
    #4
    Use Google's automatic URL removal system. It doesn't matter if the page no longer exists.
     
    ResaleBroker, Sep 23, 2005 IP
  5. johnt

    johnt Peon

    Messages:
    178
    Likes Received:
    21
    Best Answers:
    0
    Trophy Points:
    0
    #5
    The url removal tool is only a temporary fix. After 180 days the pages will reappear in the index, regardless of what is set in robots.txt.
    You could try returning a 410 ( gone ) status code for the missing pages, that may tell Google to remove them from their index, although I must confess that after many attempts, with many different methods, to get pages removed from their index permanently I have yet to see any success.

    Has anyone else here managed to do this ?

    John
     
    johnt, Sep 24, 2005 IP
  6. ResaleBroker

    ResaleBroker Active Member

    Messages:
    1,665
    Likes Received:
    50
    Best Answers:
    0
    Trophy Points:
    90
    #6
    Where does this information come from?
     
    ResaleBroker, Sep 24, 2005 IP
  7. johnt

    johnt Peon

    Messages:
    178
    Likes Received:
    21
    Best Answers:
    0
    Trophy Points:
    0
    #7
    From Google's removal tool page
    it actually says 6 months, it was 180 days last time I used it.

    I added the "regardless of what is set in robots.txt" bit based on personal experience. I emailed them asking why the files were allowed back into the index even if robots.txt still banned them after the temporary period elapsed, but just got a standard "thanks for your comments" response.
     
    johnt, Sep 24, 2005 IP
  8. webmistress

    webmistress Guest

    Messages:
    485
    Likes Received:
    36
    Best Answers:
    0
    Trophy Points:
    0
    #8
    Chris, those people have given you enough information for you to start getting good rankings from google in 2007 or 2008 perhaps, if your lucky.

    Your problem is a mild one and you should in no way try to tamper with the google database, even if it's a tool provided by google. the use of this tool is not relevant to your current problem. You say that you have a page in their cache for 2.5 months now. Pages that normally render a 404 for too long are finally devaluated after a certain amount of time, 6 months or so. Googlebot is now smart enough to see that this page is stale and will diminish the visit of it after the timeframe i just mentioned. Absolutely don't use the disallow exclusion rule for googlebot. This could jeorpardized your whole site.
     
    webmistress, Sep 24, 2005 IP