Pulling down large remote file (150 - 200 Mb) in PHP

Discussion in 'PHP' started by markowe, Oct 13, 2009.

  1. #1
    I want to pull a remote file (CJ product catalogue in CSV) down to my hosting directory at regular intervals, like daily. I figure the way to go about it would be to set up a cron job, but what about the download itself? This can't be done from PHP, right, since all PHP methods (fget, cUrl) involve reading the file into memory and then writing to disk, which I can't do with a 150mb file, which could take minutes to download and is too big for memory (?).

    So are shell commands like wget the way to go? I need to automate this but I don't want to mess around with lots of cron jobs and shell scripts though - I want the flexibility of PHP. If I wget the file using "shell_exec ('wget mybigfile')" or whatever, from PHP, will that set a shell process on the job and immediately finish my PHP script (I can cron in again later and see if it succeeded)? Or will my PHP script wait for the wget to finish, thus defeating the object? Or is there another way to have tasks like this run on some kind of background thread without tying up server time?

    Hope the question makes sense. First time I have tried to do stuff like this, so am a bit lost. Thanks for any insight.
     
    markowe, Oct 13, 2009 IP
  2. sunchiqua

    sunchiqua Peon

    Messages:
    47
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #2
    exec will wait for the task to be finished, but does it make any difference if it's initiated by cron ?
     
    sunchiqua, Oct 13, 2009 IP
  3. premiumscripts

    premiumscripts Peon

    Messages:
    1,062
    Likes Received:
    48
    Best Answers:
    0
    Trophy Points:
    0
    #3
    PHP will wait for it to finish. I suppose this doesn't matter much if you just call set_time_limit(0) and ignore_user_abort(true);

    However, I'd just go with a 1 liner cronjob..

    /usr/bin/wget http://cj.com/?bla -O /home/user/cj.txt (assuming it's possible to do this without authenticating)

    wget also has parameters to load cookies --load-cookies, --save-cookies, --keep-session-cookies etc
     
    premiumscripts, Oct 13, 2009 IP
  4. markowe

    markowe Well-Known Member

    Messages:
    1,136
    Likes Received:
    26
    Best Answers:
    0
    Trophy Points:
    165
    #4
    Thanks for the pointers. So PHP will wait even if I do a shell_exec with &? Seems to be the case from my experiments - I was hoping that would set up the wget and finish the PHP script...

    Right, I can do the whole thing with a one-liner cron job, but I need to do other stuff with PHP after (dump the data into my database etc.) and I would have preferred to have just one PHP script that is called regularly with just one cron job. Then I could do like (imagine PHP):

    first time, start the wget

    next time, if wget was successful do (say) unTAR

    next time, if unTAR was successful, do LOAD DATA INFILE (dump into database)... and so on.

    Otherwise I have to have several cron jobs running successively, or can most of this be done with a single shell script? I still need at least two cron jobs (one to do the shell stuff, one to hand off to PHP to see if the shell stuff is finished so it can get on with PHP stuff), so it just gets a bit complicated. And I just can't face learning yet another script language..!
     
    markowe, Oct 13, 2009 IP
  5. markowe

    markowe Well-Known Member

    Messages:
    1,136
    Likes Received:
    26
    Best Answers:
    0
    Trophy Points:
    165
    #5
    Ooo, right - if you pipe the output of wget (with '>') to some dummy file (which ends up 0 bytes long anyway), and use &, it WILL start as a background process and PHP can continue, though oddly it will report a failure, but I can ignore that!

    Hope that helps someone - getting closer to my ideal of running all this from PHP. Easier if I ever migrate hosting too, so I don't have to migrate loads of cron jobs.
     
    markowe, Oct 13, 2009 IP
  6. premiumscripts

    premiumscripts Peon

    Messages:
    1,062
    Likes Received:
    48
    Best Answers:
    0
    Trophy Points:
    0
    #6
    Hmm cool, thanks for reporting that, may be useful for me as well ;)
     
    premiumscripts, Oct 13, 2009 IP