If I write my own bot, very basic with just file_get_contents etc., can I set the user agent somewhere so the sites my bot/script visits knows it's a bot? Because it's cool, because I can differentiate it's hits in my own stats and because it's good manners I'd like to name my Bot - no matter how basic it is. Can I do that in PHP in combination with file reads (fopen) and file content grabbing (file_get_contents)?
And that goes in the actual php file where the script is running? I'm talking about a script that runs in the browser and runs after a form is submitted. I shall give that a try. Thanks.
Yeah... it goes in the script you are doing the file_get_contents() in (on some line before the file_get_contents() line).
Using PHP and Curl makes it even easier But please, please, please don't let your ego go out of control. A good netizen will give the cute name (short and sweet) and a link to the robot information page eg "Tops30 www.tops30.com/robots.html" That way, when we see it we don't scratch our heads and consider banning it. I used to run a thing called botspotter and one of the checks I did was for multiple useragents on a single IP. If it came up with "Tops30 v1.1" then "Tops30 v1.2" all was well. If they were radically different then I'd wonder what games you were playing - and then I might block you from access. Sarah Curl example <?php //pretending to be a browser function download_pretending($url,$user_agent='Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)') { $ch = curl_init(); curl_setopt ($ch, CURLOPT_URL, $url); curl_setopt ($ch, CURLOPT_USERAGENT, $user_agent); curl_setopt ($ch, CURLOPT_HEADER, 0); curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt ($ch, CURLOPT_REFERER, 'http://www.pcpropertymanager.com/wsnlinks/'); $result = curl_exec ($ch); curl_close ($ch); return $result; }//function download_pretending($url,$user_agent) echo download_pretending('http://www.digitalpoint.com/'); ?> Code (markup):
I have no malicious intent Initially I need one that crawls my own sites for my automated Froogle feed creator. It already works but just wanted to give the little baby a name. I've got some other ideas to do with the little critter so by then I'll make it a real gentlemen's bot - checking robots.txt, slowing it down to one hit per second, making sure it doesn't crawl anchor url's etc. I'll have a look if that code snippet works better than my own, thanks!