![]() |
|
|
#1
|
||||
|
||||
|
Google Not Consistent With robots.txt
Here's something interesting I found... Googlebot does not interpret the robots.txt file the same way as Google's robots.txt validator inside Google Sitemaps...
http://www.digitalpoint.com/~shawn/2...sistently.html
__________________
- Shawn Keyword Tracker now supports Google (once again) as well as Bing (new) and Yahoo Please do not PM, IM or email me for product or tool support (they will go unread/ignored), and don't "friend" me unless we are really friends. |
|
#2
|
||||
|
||||
|
It is amazing that Google would crawl pages that it clearly should not.
|
|
#3
|
|||
|
|||
|
I Would Agree with Digipoint i have had goolge look at link folders that it was not suppost to access
|
|
#4
|
||||
|
||||
|
Two points:
1. Google has said previously that they interpret robots.txt a bit more liberally than many spiders, in that they will try to figure out what you "meant" when there are errors (much like MSIE tries to work around coding errors). The reply from Google noted in your blog is correct, I think - the fact that a different bot is doing what you wanted it to do is testimony to Google's ability to read between the lines. 2. There is a difference between crawling and indexing. Googlebots and also other spiders do seem to crawl Disallowed folders and files - this isn't new. That's not necessarily a problem, though. It's only a problem if it starts showing up in the search indices. If it's really sensitive or private information, it should be password protected. |
|
#5
|
||||
|
||||
|
In your example, Googlebot respected the standard and the validator didn't.
And, according to Matt Cutts: Quote:
__________________
Regular Expression Tester Redirect Checker as easy as 1 2 3, even if you are not a HTTP-header guru ! |
![]() |
| Bookmarks |
| Thread Tools | |
|
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| What is robots.txt? | icare | robots.txt | 11 | Feb 23rd 2006 8:01 am |
| Robots.txt timeout - from Google when downloading my sitemap | hans | Google Sitemaps | 1 | Dec 12th 2005 12:07 pm |
| Robots.txt, Vbulletin & Google | Design Agent | robots.txt | 10 | Nov 26th 2005 10:08 am |
| robots.txt help... trying to disallow file only from Google. | vprp | robots.txt | 6 | Jul 29th 2005 9:08 am |
| Google no following robots.txt | ian_ok | robots.txt | 5 | Jul 13th 2005 6:34 pm |