spot SE bots with php (simple but doesnt work...)

Discussion in 'PHP' started by retSaMbew, Feb 2, 2006.

  1. #1
    Hi everyone,

    This thing should work in determing SE bot as user agent shouldnt it?

    if (eregi("(google|msnbot|slurp)", $_SERVER['HTTP_USER_AGENT'])))
    {
    print '<html><head><meta name="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
    </head></html>';
    exit;
    }

    after picking the bot I exit the whole script.

    However checking the page with spider simulator tool at http://www.webconfs.com/search-engine-spider-simulator.php I still see that bot sees all that its not supposed to see.

    Maybe the tool is unreliable?

    or is there a mistake in my condition?

    Thanks for your help

    RS
     
    retSaMbew, Feb 2, 2006 IP
  2. dave487

    dave487 Peon

    Messages:
    701
    Likes Received:
    20
    Best Answers:
    0
    Trophy Points:
    0
    #2
    Your tool checks for google or msn or yahoo. It won't spot whatever user agent the spider simulator is using.

    The best way to test it? Make it live on the site and see.
     
    dave487, Feb 2, 2006 IP
  3. hdogan

    hdogan Peon

    Messages:
    316
    Likes Received:
    11
    Best Answers:
    0
    Trophy Points:
    0
    #3
    You can use robots.txt instead of checking with PHP script. But if you'd like to use PHP, your regular expression will work properly (I've used your technique before - it worked well).
     
    hdogan, Feb 3, 2006 IP
  4. drugoon

    drugoon Guest

    Messages:
    702
    Likes Received:
    10
    Best Answers:
    0
    Trophy Points:
    0
    #4
    I think you are trying to make a cloak page - some content for visitor other content for bots. There is a catch: bots usually do some tests and don't identify itself as a bot and if the cloak is discovered your paged are banned for life from SE.

    I do not know what hdogan means by using robots.txt but I think the only really good method is to identify the bots by the IP.
     
    drugoon, Feb 5, 2006 IP