Setting User Agent

Discussion in 'PHP' started by T0PS3O, Apr 11, 2005.

  1. #1
    If I write my own bot, very basic with just file_get_contents etc., can I set the user agent somewhere so the sites my bot/script visits knows it's a bot?

    Because it's cool, because I can differentiate it's hits in my own stats and because it's good manners I'd like to name my Bot - no matter how basic it is.

    Can I do that in PHP in combination with file reads (fopen) and file content grabbing (file_get_contents)?
     
    T0PS3O, Apr 11, 2005 IP
  2. digitalpoint

    digitalpoint Overlord of no one Staff

    Messages:
    38,334
    Likes Received:
    2,613
    Best Answers:
    462
    Trophy Points:
    710
    Digital Goods:
    29
    #2
    ini_set('user_agent', 'Name of your bot');
    PHP:
     
    digitalpoint, Apr 11, 2005 IP
  3. T0PS3O

    T0PS3O Feel Good PLC

    Messages:
    13,219
    Likes Received:
    777
    Best Answers:
    0
    Trophy Points:
    0
    #3
    And that goes in the actual php file where the script is running?

    I'm talking about a script that runs in the browser and runs after a form is submitted.

    I shall give that a try. Thanks.
     
    T0PS3O, Apr 11, 2005 IP
  4. digitalpoint

    digitalpoint Overlord of no one Staff

    Messages:
    38,334
    Likes Received:
    2,613
    Best Answers:
    462
    Trophy Points:
    710
    Digital Goods:
    29
    #4
    Yeah... it goes in the script you are doing the file_get_contents() in (on some line before the file_get_contents() line).
     
    digitalpoint, Apr 11, 2005 IP
  5. T0PS3O

    T0PS3O Feel Good PLC

    Messages:
    13,219
    Likes Received:
    777
    Best Answers:
    0
    Trophy Points:
    0
    #5
    Thanks. You might see some weird ass bots around soon :) (No doubt there already are...)
     
    T0PS3O, Apr 11, 2005 IP
  6. sarahk

    sarahk iTamer Staff

    Messages:
    28,802
    Likes Received:
    4,534
    Best Answers:
    123
    Trophy Points:
    665
    #6
    Using PHP and Curl makes it even easier

    But please, please, please don't let your ego go out of control. A good netizen will give the cute name (short and sweet) and a link to the robot information page
    eg "Tops30 www.tops30.com/robots.html"

    That way, when we see it we don't scratch our heads and consider banning it.

    I used to run a thing called botspotter and one of the checks I did was for multiple useragents on a single IP. If it came up with "Tops30 v1.1" then "Tops30 v1.2" all was well. If they were radically different then I'd wonder what games you were playing - and then I might block you from access.

    Sarah

    Curl example
    <?php
    //pretending to be a browser
    function download_pretending($url,$user_agent='Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)') 
    { 
    $ch = curl_init(); 
    curl_setopt ($ch, CURLOPT_URL, $url); 
    curl_setopt ($ch, CURLOPT_USERAGENT, $user_agent); 
    curl_setopt ($ch, CURLOPT_HEADER, 0); 
    curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1); 
    curl_setopt ($ch, CURLOPT_REFERER, 'http://www.pcpropertymanager.com/wsnlinks/');
    $result = curl_exec ($ch); 
    curl_close ($ch); 
    return $result; 
    }//function download_pretending($url,$user_agent) 
     
    echo download_pretending('http://www.digitalpoint.com/');
    ?>
    Code (markup):
     
    sarahk, Apr 12, 2005 IP
  7. T0PS3O

    T0PS3O Feel Good PLC

    Messages:
    13,219
    Likes Received:
    777
    Best Answers:
    0
    Trophy Points:
    0
    #7
    I have no malicious intent :)

    Initially I need one that crawls my own sites for my automated Froogle feed creator. It already works but just wanted to give the little baby a name.

    I've got some other ideas to do with the little critter so by then I'll make it a real gentlemen's bot - checking robots.txt, slowing it down to one hit per second, making sure it doesn't crawl anchor url's etc.

    I'll have a look if that code snippet works better than my own, thanks!
     
    T0PS3O, Apr 12, 2005 IP