Hi there...I was wondering if anyone has a sample of a good robots.txt file to put on my server to make sure no pages get spidered. (By as many search engines as possible) I had a developement site set up and somehow it got indexed (the dev site wasn't at index.html) I had some feedback from a forum post I think so it got spidered. Also...Once I put this up will Google delist it next time it crawls? Thanks.
DOH! I knew that. I dont know what I was thinking when I typed that in. The wild card isnt used in the disallow part but the url, or directory is. Thanks for correcting me on that silly mistake. So all you need in the robots.txt is User-agent: * Disallow: /
not all robots obey the robot.txt file. i have a dev site also that's listen on noname search engines. it gets traffic too lol
That is true, Not all robots will obey such 'commands' as dissallow or even rel="nofollow". There really is no way to absolutely guarantee 100% that your pages will not be crawled
That's what I kind of figured on my own, after checking the file a million times. Thanks for confirming.
Google will listen to the robots.txt. However, it may take quite awhile until the site gets dropped from the index.