Loans - Mortgage Calculator - Adverse Credit Remortgage - Problem Mortgage - Web Advertising

PDA

View Full Version : Google Not Consistent With robots.txt


digitalpoint
Mar 22nd 2006, 11:46 am
Here's something interesting I found... Googlebot does not interpret the robots.txt file the same way as Google's robots.txt validator inside Google Sitemaps...

http://www.digitalpoint.com/~shawn/2006/03/google-not-interpreting-robotstxt-consistently.html

rustybrick
Mar 22nd 2006, 12:02 pm
It is amazing that Google would crawl pages that it clearly should not.

alifan
Mar 30th 2006, 4:00 pm
I Would Agree with Digipoint i have had goolge look at link folders that it was not suppost to access

minstrel
Apr 2nd 2006, 4:15 pm
Two points:

1. Google has said previously that they interpret robots.txt a bit more liberally than many spiders, in that they will try to figure out what you "meant" when there are errors (much like MSIE tries to work around coding errors). The reply from Google noted in your blog is correct, I think - the fact that a different bot is doing what you wanted it to do is testimony to Google's ability to read between the lines.

2. There is a difference between crawling and indexing. Googlebots and also other spiders do seem to crawl Disallowed folders and files - this isn't new. That's not necessarily a problem, though. It's only a problem if it starts showing up in the search indices. If it's really sensitive or private information, it should be password protected.

Jean-Luc
Apr 2nd 2006, 4:36 pm
In your example, Googlebot respected the standard and the validator didn't.

And, according to Matt Cutts (http://www.mattcutts.com/blog/googlebot-keep-out/): If you want to try other experiments with robots.txt without any risk at all, use our robots.txt checker built into Sitemaps. It uses the same logic that the real Googlebot uses...
Jean-Luc :confused: