Google Not Consistent With robots.txt

digitalpoint Overlord of no one Staff

Messages:

38,334

Likes Received:

2,613

Best Answers:

462

Trophy Points:

710

Digital Goods:

29

#1

Here's something interesting I found... Googlebot does not interpret the robots.txt file the same way as Google's robots.txt validator inside Google Sitemaps...

http://www.digitalpoint.com/~shawn/2006/03/google-not-interpreting-robotstxt-consistently.html

If you contact me privately for support, I'll direct you to the correct support forum. Save time and go there first.
Ingress Intel

digitalpoint, Mar 22, 2006 IP
rustybrick User ID 3

Messages:

385

Likes Received:

41

Best Answers:

0

Trophy Points:

158

#2

It is amazing that Google would crawl pages that it clearly should not.

rustybrick, Mar 22, 2006 IP
alifan Peon

Messages:

46

Likes Received:

0

Best Answers:

0

Trophy Points:

0

#3

I Would Agree with Digipoint i have had goolge look at link folders that it was not suppost to access

alifan, Mar 30, 2006 IP
minstrel Illustrious Member

Messages:

15,082

Likes Received:

1,243

Best Answers:

0

Trophy Points:

480

#4

Two points:

1. Google has said previously that they interpret robots.txt a bit more liberally than many spiders, in that they will try to figure out what you "meant" when there are errors (much like MSIE tries to work around coding errors). The reply from Google noted in your blog is correct, I think - the fact that a different bot is doing what you wanted it to do is testimony to Google's ability to read between the lines.

2. There is a difference between crawling and indexing. Googlebots and also other spiders do seem to crawl Disallowed folders and files - this isn't new. That's not necessarily a problem, though. It's only a problem if it starts showing up in the search indices. If it's really sensitive or private information, it should be password protected.

minstrel, Apr 2, 2006 IP
Jean-Luc Peon

Messages:

601

Likes Received:

30

Best Answers:

0

Trophy Points:

0

#5

In your example, Googlebot respected the standard and the validator didn't.

And, according to Matt Cutts:

If you want to try other experiments with robots.txt without any risk at all, use our robots.txt checker built into Sitemaps. It uses the same logic that the real Googlebot uses...
Click to expand...

Jean-Luc

Jean-Luc, Apr 2, 2006 IP

Log in or Sign up

Google Not Consistent With robots.txt

digitalpoint Overlord of no one Staff

rustybrick User ID 3

alifan Peon

minstrel Illustrious Member

Jean-Luc Peon

Log in or Sign up

Google Not Consistent With robots.txt

digitalpoint Overlord of no one Staff

rustybrick User ID 3

alifan Peon

minstrel Illustrious Member

Jean-Luc Peon

Useful Searches