1. Advertising
    y u no do it?

    Advertising (learn more)

    Advertise virtually anything here, with CPM banner ads, CPM email ads and CPC contextual links. You can target relevant areas of the site and show ads based on geographical location of the user if you wish.

    Starts at just $1 per CPM or $0.10 per CPC.

Setting User Agent

Discussion in 'PHP' started by T0PS3O, Apr 11, 2005.

  1. #1
    If I write my own bot, very basic with just file_get_contents etc., can I set the user agent somewhere so the sites my bot/script visits knows it's a bot?

    Because it's cool, because I can differentiate it's hits in my own stats and because it's good manners I'd like to name my Bot - no matter how basic it is.

    Can I do that in PHP in combination with file reads (fopen) and file content grabbing (file_get_contents)?
     
    T0PS3O, Apr 11, 2005 IP
  2. digitalpoint

    digitalpoint Overlord of no one Staff

    Messages:
    38,333
    Likes Received:
    2,613
    Best Answers:
    462
    Trophy Points:
    710
    Digital Goods:
    29
    #2
    ini_set('user_agent', 'Name of your bot');
    PHP:
     
    digitalpoint, Apr 11, 2005 IP
  3. T0PS3O

    T0PS3O Feel Good PLC

    Messages:
    13,219
    Likes Received:
    777
    Best Answers:
    0
    Trophy Points:
    0
    #3
    And that goes in the actual php file where the script is running?

    I'm talking about a script that runs in the browser and runs after a form is submitted.

    I shall give that a try. Thanks.
     
    T0PS3O, Apr 11, 2005 IP
  4. digitalpoint

    digitalpoint Overlord of no one Staff

    Messages:
    38,333
    Likes Received:
    2,613
    Best Answers:
    462
    Trophy Points:
    710
    Digital Goods:
    29
    #4
    Yeah... it goes in the script you are doing the file_get_contents() in (on some line before the file_get_contents() line).
     
    digitalpoint, Apr 11, 2005 IP
  5. T0PS3O

    T0PS3O Feel Good PLC

    Messages:
    13,219
    Likes Received:
    777
    Best Answers:
    0
    Trophy Points:
    0
    #5
    Thanks. You might see some weird ass bots around soon :) (No doubt there already are...)
     
    T0PS3O, Apr 11, 2005 IP
  6. sarahk

    sarahk iTamer Staff

    Messages:
    28,500
    Likes Received:
    4,460
    Best Answers:
    123
    Trophy Points:
    665
    #6
    Using PHP and Curl makes it even easier

    But please, please, please don't let your ego go out of control. A good netizen will give the cute name (short and sweet) and a link to the robot information page
    eg "Tops30 www.tops30.com/robots.html"

    That way, when we see it we don't scratch our heads and consider banning it.

    I used to run a thing called botspotter and one of the checks I did was for multiple useragents on a single IP. If it came up with "Tops30 v1.1" then "Tops30 v1.2" all was well. If they were radically different then I'd wonder what games you were playing - and then I might block you from access.

    Sarah

    Curl example
    <?php
    //pretending to be a browser
    function download_pretending($url,$user_agent='Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)') 
    { 
    $ch = curl_init(); 
    curl_setopt ($ch, CURLOPT_URL, $url); 
    curl_setopt ($ch, CURLOPT_USERAGENT, $user_agent); 
    curl_setopt ($ch, CURLOPT_HEADER, 0); 
    curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1); 
    curl_setopt ($ch, CURLOPT_REFERER, 'http://www.pcpropertymanager.com/wsnlinks/');
    $result = curl_exec ($ch); 
    curl_close ($ch); 
    return $result; 
    }//function download_pretending($url,$user_agent) 
     
    echo download_pretending('http://www.digitalpoint.com/');
    ?>
    Code (markup):
     
    sarahk, Apr 12, 2005 IP
  7. T0PS3O

    T0PS3O Feel Good PLC

    Messages:
    13,219
    Likes Received:
    777
    Best Answers:
    0
    Trophy Points:
    0
    #7
    I have no malicious intent :)

    Initially I need one that crawls my own sites for my automated Froogle feed creator. It already works but just wanted to give the little baby a name.

    I've got some other ideas to do with the little critter so by then I'll make it a real gentlemen's bot - checking robots.txt, slowing it down to one hit per second, making sure it doesn't crawl anchor url's etc.

    I'll have a look if that code snippet works better than my own, thanks!
     
    T0PS3O, Apr 12, 2005 IP