Google Ignored instructions in robots.txt

Discussion in 'Google' started by visioninfotech, May 7, 2007.

  1. #1
    I am really amazed to see the stats from google webmaster tools.

    I had an instruction in my robots.txt file about crawl delay.

    But the following is the output from Google's tools


    Parsing results Value Result
    Crawl-delay: 20 Rule ignored by Googlebot

    My view was that some nice companies like Google, Yahoo and MSN obey the robots.txt instructions, but this isn't happening
     
    visioninfotech, May 7, 2007 IP
  2. joshbond

    joshbond Peon

    Messages:
    335
    Likes Received:
    6
    Best Answers:
    0
    Trophy Points:
    0
    #2
    I've noticed this same thing. I'm starting to think that google and all the others just make up rules and then bend them when ever they feel like it.
     
    joshbond, May 7, 2007 IP
  3. visioninfotech

    visioninfotech Banned

    Messages:
    739
    Likes Received:
    33
    Best Answers:
    0
    Trophy Points:
    0
    #3
    not good, how could webmasters stop their websites being hammered by some crawling bots then :confused: :confused:

    i had one script from robert plank on WMW, anticrawl.
    but could not find something for ASP websites
     
    visioninfotech, May 10, 2007 IP
  4. Robert Allen

    Robert Allen Peon

    Messages:
    2,685
    Likes Received:
    247
    Best Answers:
    0
    Trophy Points:
    0
    #4
    I have the same issue on nodp.info, accept i havnt got robots.txt. I am hammered everyday, it is lagging the server in a way that, it takes pages twice as long to load.

    Rob
     
    Robert Allen, May 10, 2007 IP
  5. grg

    grg Guest

    Messages:
    2,692
    Likes Received:
    73
    Best Answers:
    0
    Trophy Points:
    0
    #5
    Unfortunately, they have probably some bugs or whatever in their software (who doesn't?). There were some big issues by ignoring some rules in robots.txt
     
    grg, May 10, 2007 IP
  6. visioninfotech

    visioninfotech Banned

    Messages:
    739
    Likes Received:
    33
    Best Answers:
    0
    Trophy Points:
    0
    #6
    We can't do anything about Google, Yahoo and MSN.
    But other i really want to block.

    If Someone has some nice solution for ASP based websites ??
    Please send the url of software or application. We can spend around $ 200 to stop this from our site.
     
    visioninfotech, May 10, 2007 IP
  7. Small Fry

    Small Fry Peon

    Messages:
    375
    Likes Received:
    12
    Best Answers:
    0
    Trophy Points:
    0
    #7
    Put your wallet away, If you use "noindex" meta tags it wont be ignored as a robot.txt may be.
     
    Small Fry, May 10, 2007 IP
  8. sweetfunny

    sweetfunny Banned

    Messages:
    5,743
    Likes Received:
    467
    Best Answers:
    0
    Trophy Points:
    0
    #8
    I had Google ignore the robots.txt as well.

    I purchased the domain, and instantly put a Disallow all from / (the whole site) as it was a private site for my mother and 20 of her friends.

    Installed the forum software, a week later Googles trying to crawl the calendar and getting the not logged in error. Confirmed it with Awestats, wasn't impressed.
     
    sweetfunny, May 10, 2007 IP
  9. trichnosis

    trichnosis Prominent Member

    Messages:
    13,785
    Likes Received:
    333
    Best Answers:
    0
    Trophy Points:
    300
    #9
    as i know all big search engines are obeying the robots.txt. i believe your problem is a small bug:)
     
    trichnosis, May 11, 2007 IP
  10. manish.chauhan

    manish.chauhan Well-Known Member

    Messages:
    1,682
    Likes Received:
    35
    Best Answers:
    0
    Trophy Points:
    110
    #10
    It is not necessary that every bot should follow your robots.txt, Manyspammy bots ignores your robots.txt. To block those bots, you can track their IP address by your traffic logs and block them by their IP using .htaccess...:)
     
    manish.chauhan, Apr 9, 2008 IP
  11. Pixelrage

    Pixelrage Peon

    Messages:
    5,083
    Likes Received:
    128
    Best Answers:
    0
    Trophy Points:
    0
    #11
    This isn't good news for anyone who created a spider trap.
     
    Pixelrage, Apr 9, 2008 IP