psbot and gigabot wont stop even when banned from robots.txt

Discussion in 'robots.txt' started by SaN-DeeP, Feb 28, 2006.

  1. #1
    any possible solutions to get rid of this bots permanently...
    i find around 100+ spiders crawling on the entire server at any time !
     
    SaN-DeeP, Feb 28, 2006 IP
  2. mkeen

    mkeen Peon

    Messages:
    186
    Likes Received:
    6
    Best Answers:
    0
    Trophy Points:
    0
    #2
    They ignored my robots.txt also.

    I ended up putting this at the top of my sites, its PHP btw hope it helps.

    $checknaughty = $_SERVER['HTTP_USER_AGENT'];
    if($checknaughty == "psbot/0.1 (+http://www.picsearch.com/bot.html)") {
    echo "PISS OFF PSBOT STOP IGNORING robots.txt, your clearly disallowed now learn to read";
    die();

    }
     
    mkeen, Feb 28, 2006 IP
    T0PS3O likes this.
  3. SaN-DeeP

    SaN-DeeP Well-Known Member

    Messages:
    590
    Likes Received:
    12
    Best Answers:
    0
    Trophy Points:
    140
    #3
    thanks Matt,
    asking myself whats the use of robots.txt if spiders dont honor them...
     
    SaN-DeeP, Feb 28, 2006 IP
  4. wkd

    wkd Peon

    Messages:
    64
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #4
    For Gigablast you could also add this to the page:

    <meta name="gigabot" content="noindex,nofollow" />
    Code (markup):
     
    wkd, Mar 1, 2006 IP
  5. SaN-DeeP

    SaN-DeeP Well-Known Member

    Messages:
    590
    Likes Received:
    12
    Best Answers:
    0
    Trophy Points:
    140
    #5
    gigabot has completely stopped after disallowing from robots.txt but psbot sucks hightime.... i should say !

    thanks for all your comments above :)
     
    SaN-DeeP, Mar 1, 2006 IP
  6. Nintendo

    Nintendo ♬ King of da Wackos ♬

    Messages:
    12,890
    Likes Received:
    1,064
    Best Answers:
    0
    Trophy Points:
    430
    #6
    You can't ban any one using robots.txt. That's only a suggestion, like saying 'Please don't go here, but we can't stop you.'

    In the script, or .htaccess are probably the only ways to ban them.
     
    Nintendo, Mar 1, 2006 IP
  7. Carl Sarnstrand

    Carl Sarnstrand Peon

    Messages:
    3
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #7
    Picsearch search spiders always respect robot.txt and we will immidiately address any problems that you inform us about. Please send an e-mail to info@picsearch.com and we will see that any problems get handled as soon as possible.

    Please go to http://www.picsearch.com to try our service.

    Picsearch takes robot.txt seriously and has a short text on our website at http://www.picsearch.com/menu.cgi?item=Psbot.

    Best Regards

    Carl Sarnstrand
    Communications Manager
    Picsearch
     
    Carl Sarnstrand, Jun 19, 2007 IP
  8. trichnosis

    trichnosis Prominent Member

    Messages:
    13,785
    Likes Received:
    333
    Best Answers:
    0
    Trophy Points:
    300
    #8
    blocking the ips of those bots is a better opion.

    both are not following the robots.txt
     
    trichnosis, Aug 6, 2007 IP