How to make cUrl download faster?

Discussion in 'PHP' started by Smaug, Sep 28, 2009.

  1. #1
    Hello,
    I have a script that downloads and parses about 250 pages in a row. It takes roughly from 0,7s to 1s to download one page. That wouldn't be a problem, but if the 250 would soon become 2500 or more I would need to make it faster, especially when I need to download it every few hours.

    To roughly show the script structure

    
    while ($item = mysql_fetch_array($result){
    
    $html = $curl->fetchPage($item["address"]);
    parse ($html);
    ...
    
    }
    PHP:
    I found that there's possible multi-threading for curl here: http://semlabs.co.uk/journal/object-oriented-curl-class-with-multi-threading

    I could iplement it somehow but before I do that (because the script is much more complex than what I just wrote) I want to ask - will it be faster? I mean, can it cut the download time atleast to half?
     
    Smaug, Sep 28, 2009 IP
  2. premiumscripts

    premiumscripts Peon

    Messages:
    1,062
    Likes Received:
    48
    Best Answers:
    0
    Trophy Points:
    0
    #2
    Well, yes, multicurl is threaded. Which means it will fetch multiple pages at the same time, which could reduce your time by a whole lot. Definitely worthy of your time to implement this. Probably much, much faster than half your current speed as well, but this is up to the test.
     
    premiumscripts, Sep 28, 2009 IP
  3. Smaug

    Smaug Peon

    Messages:
    374
    Likes Received:
    12
    Best Answers:
    0
    Trophy Points:
    0
    #3
    Thanks, I needed to confirm this because this is my first time extensively using cUrl.
     
    Smaug, Sep 28, 2009 IP
  4. Smaug

    Smaug Peon

    Messages:
    374
    Likes Received:
    12
    Best Answers:
    0
    Trophy Points:
    0
    #4
    It's really faster using multi-threading. For example if I had 50 requests as single threads it took about 55 seconds. When I used multithreading (50 threads) it was only about 8 seconds. However if I use more than 50 threads or so, it doesn't load some pages completely or at all. Eg. when I tried 100 threads then about 10% requests failed.

    Anyway I must say that curl multithreading for multiple requests is awesome!
     
    Smaug, Oct 1, 2009 IP