View Full Version : What bots ignore robots.txt?
Jon12345
Aug 16th 2005, 7:35 am
Ok, so there are various bots and crawlers parading around the internet. I am using this...
User-agent: *
Disallow: red.php
...in my robots.txt file. I presume this is correct if I don't want any bots to go to follow to the red.php page. yes?
But what percentage of bots actually ignore such a request? Any idea?
Also, should I use a no-follow tag instead?
Thanks,
Jon
Willy
Aug 16th 2005, 8:47 am
All reputable, major bots honor robots.txt. If a crawler doesn't honor it, it's likely to ignore no-follow as well, so I don't think you need to bother about that.
You can setup a spider trap to catch bad-mannered bots: http://www.fleiner.com/bots/
lorien1973
Aug 16th 2005, 8:53 am
I think askjeeves ignores robots.txt, but I'm not sure.
minstrel
Oct 15th 2005, 6:52 pm
No. They claim they honor it:
http://sp.ask.com/docs/about/aj/teoma.htm#6
Teoma Search Technology: The Engine That Drives the Search
Teoma, which means 'expert' in Gaelic, is unlike any other search engine out there. Now, we could throw a lot of fancy terms at you, like refinement and relevance and advanced algorithms. And all of these describe what makes Teoma so powerful. But, what's really important for you to know is that Teoma adds a new dimension to your search results-authority. Instead of ranking results based upon the sites with the most links leading to them, Teoma analyzes the Web as it naturally occurs - in its subject-specific communities - to determine which sites are most relevant. In December 2001, we integrated Teoma's search technology into Ask Jeeves, and within one year searchers' satisfaction with the site increased 45 percent. Through Teoma, we continue to advance our technologies, extending our reach to the outer fringes of the Web to become the most relevant search engine online.
The Teoma Web Crawler FAQ
The Teoma Crawler is Ask Jeeves' Web-indexing robot (or, crawler/spider, as they are typically referred to in the search world). The crawler collects documents from the Web to build the ever-expanding index for our advanced search functionality at Ask Jeeves at Ask.com, Ask.co.uk and Teoma.com (among other Web sites that license the proprietary Teoma search technology).
Q: Does Teoma observe the Robot Exclusion Standard?
A: Yes, we obey the 1994 Robots Exclusion Standard (RES), which is part of the Robot Exclusion Protocol. The Robots Exclusion Protocol is a method that allows Web site administrators to indicate to robots which parts of their site should not be visited by the robot. For more information on the RES, and the Robot Exclusion Protocol, please visit http://www.robotstxt.org/wc/exclusion.html
Q: Can I prevent the Teoma crawler from indexing all or part of my site/URL?
A: Yes. The Teoma crawler will respect and obey commands that "ask" it not to index all or part of a given URL. To specify that the Teoma crawler visit only pages whose paths begin with /public, include the following lines:
# Allow only specific directories
User-agent: Teoma
Disallow: /
Allow: /public
vBulletin® v3.8.4, Copyright ©2000-2009, Jelsoft Enterprises Ltd.