I make a few direct connections to other websites for legitimate and authorised screen scraping. It is possible to alter your user agent so the sites can see exactly how many times you are requesting files from them. This is similar to google sending their user agent of "Googlebot/2.1 (+http://www.google.com/bot.html)". ini_set (user_agent, "My Web Site"); $filestring=file_get_contents("http://www.domain.com"); echo"$filestring"; PHP: I just found this out today and thought it may be useful to some people. Obviously if you are doing unauthorised screen scraping you may not want to identify yourself so maybe you could say you are Googlebot/2.1 (+http://www.google.com/bot.html)......... EDIT: just thought that if the user agent can be changed this easily then anybody cloaking by matching the string googlebot can easily be detected if google change their user agent to MSIE.
you can try something like this: $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, "siteurl"); curl_setopt($ch, CURLOPT_USERPWD, "username:password"); curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.1) Gecko/20060111 Firefox/1.5.0.1"); curl_setopt($ch, CURLOPT_HEADER, 1); curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE); curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE); curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE); curl_exec($ch); curl_close($ch); PHP:
The code I posted works fine having tested it from one site to another. It seems quite an elegant solution to something that is not really a problem for most people.
Never thought of that - you could send automated traffic with the MSIE user agent and people would think it was legitimate visitors (unless they looked at the IP).
I already like it and started using it for good I was just commenting because this way opens wide doors.