Bad Bot Advice

Discussion in 'robots.txt' started by ArtfulWebSites, Mar 23, 2007.

  1. #1
    I have installed a "bad bot trap" on my web sites to catch those crawlers that do not obey the robots.txt file directives. I was under the impression that all the "big boys" (Google, Yahoo, et al) DO FOLLOW the rules.

    I was surprised to find the following in my Bad Bots Report:

    66.249.65.161, agent is Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

    and 66.98.160.93, agent is Jayde Crawler. http://www.jayde.com

    I checked the IP addresses, and the first one does indeed belongs to Google, but the second comes up as belonging to: Everyones Internet of Houston, TX. Does this second one even belog to Jayde.com? Does anybody know?

    Is this possible - Google and Jayde NOT FOLOWING the robots.txt rules? Or, are these spoofed IPs and/or agent IDs?

    I certainly don't want to turn away the valid spiders. What should I do with these?
     
    ArtfulWebSites, Mar 23, 2007 IP
  2. ArtfulWebSites

    ArtfulWebSites Peon

    Messages:
    130
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    0
    #2
    Anyone have any advice or opinions???
     
    ArtfulWebSites, Mar 29, 2007 IP
  3. trafficnotice

    trafficnotice Peon

    Messages:
    391
    Likes Received:
    9
    Best Answers:
    0
    Trophy Points:
    0
    #3
    Wild guess, but it could be that Google visits the site but doesn't index it. Why would they crawl it if they are not going to put it in the index, I don't know but don't forget they are doing some work on AI that doesn't have much to do with their search engine. Or it could be one of the Google employees messing around on their spare time. I am just guessing though!
     
    trafficnotice, Mar 29, 2007 IP
  4. ArtfulWebSites

    ArtfulWebSites Peon

    Messages:
    130
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    0
    #4
    ...and, some very good guesses at that! Thanks so much for the enlightenment.

    On the side of caution, I did NOT add those IPs to my block list. I think I will keep it that way.

    Thanks again for your insight. :)
     
    ArtfulWebSites, Mar 29, 2007 IP
  5. helleborine

    helleborine Well-Known Member

    Messages:
    915
    Likes Received:
    70
    Best Answers:
    0
    Trophy Points:
    120
    #5
    The bad bots might be spoofing Google's IP, no?
     
    helleborine, Apr 12, 2007 IP
  6. cyberhacker665

    cyberhacker665 Peon

    Messages:
    113
    Likes Received:
    13
    Best Answers:
    0
    Trophy Points:
    0
    #6
    no way google ip's can be spoofed
     
    cyberhacker665, Apr 14, 2007 IP
  7. kirby009

    kirby009 Peon

    Messages:
    608
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    0
    #7
    i doesn't sound good you may have just live with it.
     
    kirby009, Jun 12, 2007 IP
  8. rJBee

    rJBee Guest

    Messages:
    146
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    0
    #8
    Yea as cyberhacker says, its hard to spook google's IP. Google is not just 1 person company, there is lots of people who are just investigating and testing the system, so it will be nearly impossible.
     
    rJBee, Jun 14, 2007 IP
  9. abedelrahman

    abedelrahman Peon

    Messages:
    213
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    0
    #9
    i actually go it, you use you use your command prompt

    open it up,
    out in "ping google.com"

    it will take a moment and then it will tell you

    some stuff, and tells you this is the ip:34.233.187.99

    hope this help, you can try it with all the other website!!
     
    abedelrahman, Jun 27, 2007 IP
  10. st_jimi

    st_jimi Peon

    Messages:
    632
    Likes Received:
    8
    Best Answers:
    0
    Trophy Points:
    0
    #10
    Ive had that before a brand new site which was just uploaded only i knew about and in 10minutes i had a member turned out he was from google and worked on the googlebot and was just look at some of the new sites about
     
    st_jimi, Jun 29, 2007 IP
  11. trichnosis

    trichnosis Prominent Member

    Messages:
    13,785
    Likes Received:
    333
    Best Answers:
    0
    Trophy Points:
    300
    #11
    are you sure that your software shows the collects the true date for you?

    i can not imagine that google does not obeys the robots.txt
     
    trichnosis, Aug 6, 2007 IP