Please give me a robots.txt file to exclude as many search engines as possible!!

Discussion in 'robots.txt' started by jgjg, May 18, 2007.

  1. #1
    Hi there...I was wondering if anyone has a sample of a good robots.txt file to put on my server to make sure no pages get spidered. (By as many search engines as possible)

    I had a developement site set up and somehow it got indexed (the dev site wasn't at index.html) I had some feedback from a forum post I think so it got spidered.

    Also...Once I put this up will Google delist it next time it crawls?

    Thanks.
     
    jgjg, May 18, 2007 IP
  2. kentuckyslone

    kentuckyslone Notable Member

    Messages:
    4,371
    Likes Received:
    367
    Best Answers:
    0
    Trophy Points:
    205
    #2
    SOmeone correct me if I am wrong, but I thnk this is all you need:

    User-agent: *
    Disallow: *
     
    kentuckyslone, May 18, 2007 IP
  3. sweetfunny

    sweetfunny Banned

    Messages:
    5,743
    Likes Received:
    467
    Best Answers:
    0
    Trophy Points:
    0
    #3
    Disallow: /

    You got the useragent right. The / means root.
     
    sweetfunny, May 18, 2007 IP
  4. kentuckyslone

    kentuckyslone Notable Member

    Messages:
    4,371
    Likes Received:
    367
    Best Answers:
    0
    Trophy Points:
    205
    #4
    DOH! I knew that. I dont know what I was thinking when I typed that in. The wild card isnt used in the disallow part but the url, or directory is.

    Thanks for correcting me on that silly mistake.

    So all you need in the robots.txt is

    User-agent: *
    Disallow: /
     
    kentuckyslone, May 18, 2007 IP
  5. Dudibob

    Dudibob Peon

    Messages:
    618
    Likes Received:
    14
    Best Answers:
    0
    Trophy Points:
    0
    #5
    just remember that not all robots obey the robots.txt file ;)
     
    Dudibob, May 18, 2007 IP
  6. seoperson

    seoperson Peon

    Messages:
    501
    Likes Received:
    9
    Best Answers:
    0
    Trophy Points:
    0
    #6
    yes u need only :
    user-agent: *
    Disallow: /

    i agree with above
     
    seoperson, May 18, 2007 IP
  7. hmansfield

    hmansfield Guest

    Messages:
    7,904
    Likes Received:
    298
    Best Answers:
    0
    Trophy Points:
    280
    #7
    Is this firm, or do some spiders crawl what ever the hell they want?
     
    hmansfield, May 18, 2007 IP
  8. panerai

    panerai Active Member

    Messages:
    179
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    58
    #8
    not all robots obey the robot.txt file. i have a dev site also that's listen on noname search engines. it gets traffic too lol
     
    panerai, May 18, 2007 IP
  9. kentuckyslone

    kentuckyslone Notable Member

    Messages:
    4,371
    Likes Received:
    367
    Best Answers:
    0
    Trophy Points:
    205
    #9
    That is true, Not all robots will obey such 'commands' as dissallow or even rel="nofollow". There really is no way to absolutely guarantee 100% that your pages will not be crawled
     
    kentuckyslone, May 18, 2007 IP
  10. hmansfield

    hmansfield Guest

    Messages:
    7,904
    Likes Received:
    298
    Best Answers:
    0
    Trophy Points:
    280
    #10
    That's what I kind of figured on my own, after checking the file a million times.
    Thanks for confirming.
     
    hmansfield, May 18, 2007 IP
  11. jgjg

    jgjg Peon

    Messages:
    595
    Likes Received:
    14
    Best Answers:
    0
    Trophy Points:
    0
    #11
    do you think the search engines may delist the site? or just not update it?
     
    jgjg, May 18, 2007 IP
  12. jgjg

    jgjg Peon

    Messages:
    595
    Likes Received:
    14
    Best Answers:
    0
    Trophy Points:
    0
    #12
    also does google obey robots.txt?
     
    jgjg, May 20, 2007 IP
  13. Christine8

    Christine8 Peon

    Messages:
    257
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #13
    Google will listen to the robots.txt. However, it may take quite awhile until the site gets dropped from the index.
     
    Christine8, May 20, 2007 IP