Hi there, I've been working on a little script that crawls the web, but i can't find a way to add download limit to my script. My script $master = curl_multi_init(); $curl_arr = array(); // add additional curl options here $std_options = array(CURLOPT_RETURNTRANSFER => true, CURLOPT_FOLLOWLOCATION => true); $options = ($custom_options) ? ($std_options + $custom_options) : $std_options; // start the first batch of requests foreach ($urls AS $uId => $url) { $ch = curl_init(); $options[CURLOPT_URL] = $url['url']; curl_setopt_array($ch, $options); curl_multi_add_handle($master, $ch); // set handle so we can find back the releated data... $handles[$ch] = $uId; } do { while(($execrun = curl_multi_exec($master, $running)) == CURLM_CALL_MULTI_PERFORM); if($execrun != CURLM_OK) break; // a request was just completed -- find out which one while ($done = curl_multi_info_read($master)) { $info = curl_getinfo($done['handle']); $curHandle = $handles[$done['handle']]; $urls[$curHandle]['code'] = $info['http_code']; switch ($info['http_code']) { case 200: $output = curl_multi_getcontent($done['handle']); break; case 301: case 302: break; case 404: break; default: $urls[$curHandle]['errno'] = $curl_errno($done['handle']); $urls[$curHandle]['error'] = $curl_error($done['handle']); break; } // remove the curl handle that just completed curl_multi_remove_handle($master, $done['handle']); } } while ($running); curl_multi_close($master); print_r($urls); PHP: I've found a piece of PHP code that would do the job, but don't know how to add it so it works as expected. URL: http://www.phpkode.com/source/s/multicurl-class-library/multicurl-class-library/MultiCurl.class.php Line: 136 Code: if (!$active || $mrc != CURLM_OK || curl_getinfo($ch, CURLINFO_SIZE_DOWNLOAD) >= $this->maxSize) { $this->closeSession($i); } PHP: I'm missing something, but can't seem to find the way to add this (looking for over 2 days now...) can anyone help me here?
As I've explained this to someone else who had the same problem on SO community, there is no way to do this with PHP's built in curl functions, without making a separate request to the webserver hosting file. How about a file_get_contents from the current URL you're looping with, and just checking it's length. You could potentially make a request with curl_setopt($ch, CURLOPT_NOBODY, true); and read the Content-Length header, and then make a second request to download only if Content-Length is smaller than your max. This wouldn't be foolproof anyway.
Replace line 26 with if($execrun != CURLM_OK || curl_getinfo($ch, CURLINFO_SIZE_DOWNLOAD) >= size_to_replace)
One per time, while curl handles multiple connetions.. I see, but it should be possible imho. Hopefully it is, someway... Thanks, but it doesn't work, i need to close the connection to stop the data flow. When a handle has reached the 'limit' the handle should be 'ended' and the 'content' should be still available.
It might just be me overlooking your request, but your code already does curl_getinfo. Is there a reason why can't you just check the size as well? $info = curl_getinfo($done['handle']); $size = curl_getinfo($done['handle'], CURLINFO_CONTENT_LENGTH_DOWNLOAD); PHP:
Incase you are wondering if you can get the size before you make the request with CURL, then that is not possible. curl_getinfo only works after exec. Like edduvs said, you can go to a different route and in your first foreach loop for $urls, make a HEAD request for the content length. This will be lighter/faster: foreach ($urls AS $uId => $url) { $curl = curl_init(); curl_setopt($curl, CURLOPT_URL, $url['url']); curl_setopt($curl, CURLOPT_FILETIME, true); curl_setopt($curl, CURLOPT_NOBODY, true); curl_setopt($curl, CURLOPT_RETURNTRANSFER, true); curl_exec($curl); $size = curl_getinfo($curl, CURLINFO_CONTENT_LENGTH_DOWNLOAD); curl_close($curl); if ($size > X) { // your old code } } PHP: Additionally I noticed an issue with your existing code. The way you are setting your $options will cause a notice to occur, it should be isset($custom_options) ? ... instead.
Thanks, but the documents i try to load doesn't give content-length... so that's not a option... to bad... so i see that this isn't available in PHP (yet ) And about the notice, i agree, this was just a quick example...
How exactly would you get the size of a document in any programming or scripting language if you don't have the content length headers?