Yahoo Crawler Authentication - Help Please

Discussion in 'PHP' started by AHA7, Sep 18, 2007.

  1. #1
    Hello,

    I have tried using robots.txt to exclude parts of my site from Yahoo index but Yahoo seems to ignore the robots.txt instructions in some cases and some of those pages excluded by the robots.txt instructions still get indexed in Yahoo. So I figured out that the best way to prevent those pages from being indexed in Yahoo is to make them return a 404 (Not Found) response only to Yahoo crawler.

    I've heard that to authenticate Yahoo crawler I'll need reverse DNS lookup to make sure that the domain belongs to Yahoo and then a forward DNS lookup to make sure that the resulting IP matches the original one...

    Could someone please help me with the PHP code to do this? How do I do a reverse/forward DNS lookup with PHP?
     
    AHA7, Sep 18, 2007 IP
  2. nico_swd

    nico_swd Prominent Member

    Messages:
    4,153
    Likes Received:
    344
    Best Answers:
    18
    Trophy Points:
    375
    #2
    All Yahoo! crawlers have the same (or similar) user agent, which you can verify with a single line of code.

    
    if (preg_match('/(yahoo|slurp)/i', $_SERVER['HTTP_USER_AGENT']))
    {
        header('HTTP/1.0 404 Not found', true, 404);
        exit();
    }
    
    PHP:

    And you might as well want to have a look at this.

    http://www.ysearchblog.com/archives/000372.html

    I guess there's something wrong in your robots.txt file.
     
    nico_swd, Sep 19, 2007 IP