I have installed a "bad bot trap" on my web sites to catch those crawlers that do not obey the robots.txt file directives. I was under the impression that all the "big boys" (Google, Yahoo, et al) DO FOLLOW the rules. I was surprised to find the following in my Bad Bots Report: 66.249.65.161, agent is Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) and 66.98.160.93, agent is Jayde Crawler. http://www.jayde.com I checked the IP addresses, and the first one does indeed belongs to Google, but the second comes up as belonging to: Everyones Internet of Houston, TX. Does this second one even belog to Jayde.com? Does anybody know? Is this possible - Google and Jayde NOT FOLOWING the robots.txt rules? Or, are these spoofed IPs and/or agent IDs? I certainly don't want to turn away the valid spiders. What should I do with these?
Wild guess, but it could be that Google visits the site but doesn't index it. Why would they crawl it if they are not going to put it in the index, I don't know but don't forget they are doing some work on AI that doesn't have much to do with their search engine. Or it could be one of the Google employees messing around on their spare time. I am just guessing though!
...and, some very good guesses at that! Thanks so much for the enlightenment. On the side of caution, I did NOT add those IPs to my block list. I think I will keep it that way. Thanks again for your insight.
Yea as cyberhacker says, its hard to spook google's IP. Google is not just 1 person company, there is lots of people who are just investigating and testing the system, so it will be nearly impossible.
i actually go it, you use you use your command prompt open it up, out in "ping google.com" it will take a moment and then it will tell you some stuff, and tells you this is the ip:34.233.187.99 hope this help, you can try it with all the other website!!
Ive had that before a brand new site which was just uploaded only i knew about and in 10minutes i had a member turned out he was from google and worked on the googlebot and was just look at some of the new sites about
are you sure that your software shows the collects the true date for you? i can not imagine that google does not obeys the robots.txt