Digital Point Forums
iKobo

Go Back   Digital Point Forums > Design & Development > Site & Server Administration > robots.txt
Google Analytics
Log In to view
your analytics

Reply
 
Thread Tools
  #1  
Old Mar 22nd 2006, 10:46 am
digitalpoint's Avatar
digitalpoint digitalpoint is offline
My cat is on Prozac... really. lol
 
Join Date: Mar 2004
Location: San Diego, California
Posts: 22,744
digitalpoint has a reputation beyond reputedigitalpoint has a reputation beyond reputedigitalpoint has a reputation beyond reputedigitalpoint has a reputation beyond reputedigitalpoint has a reputation beyond reputedigitalpoint has a reputation beyond reputedigitalpoint has a reputation beyond reputedigitalpoint has a reputation beyond reputedigitalpoint has a reputation beyond reputedigitalpoint has a reputation beyond reputedigitalpoint has a reputation beyond repute
Phone Verified
Google Not Consistent With robots.txt

Here's something interesting I found... Googlebot does not interpret the robots.txt file the same way as Google's robots.txt validator inside Google Sitemaps...

http://www.digitalpoint.com/~shawn/2...sistently.html
__________________
~ Shawn @Twitter
Keyword Tracker
Please do not PM, IM or email me for product or tool support (they will go unread/ignored), and don't "friend" me unless we are really friends.
Reply With Quote
  #2  
Old Mar 22nd 2006, 11:02 am
rustybrick's Avatar
rustybrick rustybrick is offline
User ID 3
 
Join Date: Mar 2004
Location: New York
Posts: 368
rustybrick has a spectacular aura aboutrustybrick has a spectacular aura aboutrustybrick has a spectacular aura about
It is amazing that Google would crawl pages that it clearly should not.
__________________
Barry Schwartz, CEO of RustyBrick - Web Development

Reply With Quote
  #3  
Old Mar 30th 2006, 3:00 pm
alifan alifan is offline
Grunt
 
Join Date: Mar 2006
Posts: 46
alifan is on a distinguished road
I Would Agree with Digipoint i have had goolge look at link folders that it was not suppost to access
Reply With Quote
  #4  
Old Apr 2nd 2006, 3:15 pm
minstrel's Avatar
minstrel minstrel is offline
Celestial Defender
 
Join Date: Sep 2004
Location: Ottawa, Canada
Posts: 15,050
minstrel has a reputation beyond reputeminstrel has a reputation beyond reputeminstrel has a reputation beyond reputeminstrel has a reputation beyond reputeminstrel has a reputation beyond reputeminstrel has a reputation beyond reputeminstrel has a reputation beyond reputeminstrel has a reputation beyond reputeminstrel has a reputation beyond reputeminstrel has a reputation beyond reputeminstrel has a reputation beyond repute
Two points:

1. Google has said previously that they interpret robots.txt a bit more liberally than many spiders, in that they will try to figure out what you "meant" when there are errors (much like MSIE tries to work around coding errors). The reply from Google noted in your blog is correct, I think - the fact that a different bot is doing what you wanted it to do is testimony to Google's ability to read between the lines.

2. There is a difference between crawling and indexing. Googlebots and also other spiders do seem to crawl Disallowed folders and files - this isn't new. That's not necessarily a problem, though. It's only a problem if it starts showing up in the search indices. If it's really sensitive or private information, it should be password protected.
Reply With Quote
  #5  
Old Apr 2nd 2006, 3:36 pm
Jean-Luc's Avatar
Jean-Luc Jean-Luc is offline
Twilight Vanquisher
Recent Blog: Arfooo et nofollow
 
Join Date: Dec 2005
Location: Brussels, Belgium
Posts: 601
Jean-Luc will become famous soon enough
In your example, Googlebot respected the standard and the validator didn't.

And, according to Matt Cutts:
Quote:
If you want to try other experiments with robots.txt without any risk at all, use our robots.txt checker built into Sitemaps. It uses the same logic that the real Googlebot uses...
Jean-Luc
__________________
Regular Expression Tester
Redirect Checker as easy as 1 2 3, even if you are not a HTTP-header guru !
Reply With Quote
Reply

Bookmarks

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
What is robots.txt? icare robots.txt 11 Feb 23rd 2006 8:01 am
Robots.txt timeout - from Google when downloading my sitemap hans Google Sitemaps 1 Dec 12th 2005 12:07 pm
Robots.txt, Vbulletin & Google Design Agent robots.txt 10 Nov 26th 2005 10:08 am
robots.txt help... trying to disallow file only from Google. vprp robots.txt 6 Jul 29th 2005 9:08 am
Google no following robots.txt ian_ok robots.txt 5 Jul 13th 2005 6:34 pm


All times are GMT -8. The time now is 2:20 pm.