Yahoo Spidering With Default PHP user agent?

Discussion in 'Yahoo' started by digitalpoint, Nov 4, 2009.

  1. #1
    So I was skimming through weblogs to check bots we are blocking and I see a *ton* of these...

    67.212.163.122 - - [04/Nov/2009:16:31:58 -0800] "GET /forumdisplay.php?f=35&page=29 HTTP/1.1" 403 - "-" "PHP/5.2.9"
    67.212.163.122 - - [04/Nov/2009:16:32:00 -0800] "GET /forumdisplay.php?f=35&page=18 HTTP/1.1" 403 - "-" "PHP/5.2.9"
    67.212.163.122 - - [04/Nov/2009:16:32:00 -0800] "GET /forumdisplay.php?f=35&page=17 HTTP/1.1" 403 - "-" "PHP/5.2.9"
    67.212.163.122 - - [04/Nov/2009:16:32:00 -0800] "GET /forumdisplay.php?f=35&page=16 HTTP/1.1" 403 - "-" "PHP/5.2.9"
    Code (markup):
    So being curious about who might be spidering thousands of pages of this site with PHP, I did a reverse DNS lookup on the IP address:

    b3091196.crawl.yahoo.net

    Going further to check ownership of the IP address, it's indeed Yahoo...

    http://ws.arin.net/whois/?queryinput=67.195.112.124

    So why in the hell are they running (at least a portion of) their spiders on PHP without even bothering to change the user agent?

    Come on guys... at least take the time to do this:

    ini_set ('user_agent', 'Yahoo! Slurp/4.0; We are a sucky search engine, help Microsoft!');
    PHP:
     
    digitalpoint, Nov 4, 2009 IP