1. Advertising
    y u no do it?

    Advertising (learn more)

    Advertise virtually anything here, with CPM banner ads, CPM email ads and CPC contextual links. You can target relevant areas of the site and show ads based on geographical location of the user if you wish.

    Starts at just $1 per CPM or $0.10 per CPC.

I need a solution

Discussion in 'PHP' started by ija61, Nov 3, 2011.

  1. #1
    Hy.

    I have a script that crawl the page and extract some informtaion from there, my problem is that I can no do more then 10 pages per running because I will get and error, If i remember right is a timeout error. I try to increase the timeout of the procedure, even the program but no luck.
    The idea is that I use PHPCurl to send a url and the extract some information from this url, and the page number I give it in the url.

    any idea, I'm in a hurry so no code or image posting but if you need I will be happy to post it later.

    Thank you
     
    Solved! View solution.
    ija61, Nov 3, 2011 IP
  2. rainborick

    rainborick Well-Known Member

    Messages:
    424
    Likes Received:
    33
    Best Answers:
    0
    Trophy Points:
    120
    #2
    You are probably correct that your script is stopping because of the timeout error. If your site relies on a shared host, you won't be able to change the time limit because shared hosts cannot afford to have scripts that run for a long time because it slows down all of the other sites on the same server.

    If you can run cron jobs, you should be able to re-write your script so that it can be run periodically and send it's results to a data file that you can read with another script as needed. Good luck!
     
    rainborick, Nov 3, 2011 IP
  3. MyVodaFone

    MyVodaFone Well-Known Member

    Messages:
    1,048
    Likes Received:
    42
    Best Answers:
    10
    Trophy Points:
    195
    #3
    Just in case your not on shared hosting, the time out code is:

    Put at the top of your script
    set_time_limit(0);
    PHP:
     
    MyVodaFone, Nov 3, 2011 IP
  4. SerjSagan

    SerjSagan Member

    Messages:
    38
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    46
    #4
    Most shared hosting deals that I've encountered do not restrict
    set_time_out(0);
    Code (markup):
    But just in case, you should be able to create php.ini (or edit if one already exists) IN THE SAME FOLDER as your script, in there add this line:
    max_execution_time = 3600
    Code (markup):
    no semi colons or anything, by the way this is for 1 hour (60*60 = 3600 seconds), you can change that to longer if you need it...
     
    SerjSagan, Nov 3, 2011 IP
  5. Fruktkaka

    Fruktkaka Greenhorn

    Messages:
    90
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    18
    #5
    If you're outputting stuff from the script try to flush the output buffer after each iteration with flush(), it's a quick and dirty way to do things but I've managed to get around timeouts by using it.
     
    Fruktkaka, Nov 3, 2011 IP
  6. samyak

    samyak Active Member

    Messages:
    280
    Likes Received:
    7
    Best Answers:
    4
    Trophy Points:
    90
    #6
    Add this in the top of your script.
    ini_set('max_execution_time', '3600');

    This should allow your script to run for a whole hour withour giving timeout error.
     
    samyak, Nov 3, 2011 IP
  7. ija61

    ija61 Member

    Messages:
    186
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    35
    #7
    ini_set('max_execution_time', '3600');
    I have already try this but is does not work.

    I think it is because what rainborick said that I'm on shared hosting and the machine stop me, and I thing that ini_set does not work because I do not have require privilege. I try to use it out side the for in inside, and nothing...

    I will try to test the flush() and if not I thing I will go Pearl.

    Thank you very much.
     
    ija61, Nov 3, 2011 IP
  8. #8
    There used to be a project called phpfork which allowed you to run threaded processes. Don't know its current status or alternatives.

    What's really going on is that indexing isn't the job for a webpage really.

    You need a cron starting up every minute (for example) to take a link from a table, retrieve the page, parse it and add new links back into the table and then update the original links status to "crawled". If it hasn't timed out yet it can take another link...

    Then if the script times out another will start in a minute and continue processing. The table should grow much faster than you can process so its up to you how many processes you want running and what the server capacity is. You may need to throttle it or risk the wrath of your hosting company. There is a reason Hostgator doesn't like OpenX installs and thats the server load. A spider would be many times more demanding than a busy OpenX install.
     
    sarahk, Nov 5, 2011 IP
  9. ija61

    ija61 Member

    Messages:
    186
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    35
    #9
    Thank you... I will use this for my next project..
     
    ija61, Nov 6, 2011 IP
  10. rusianace

    rusianace Peon

    Messages:
    6
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #10
    You might need to sleep before the next crawl. Some servers have implemented throttling. FYI :)
     
    rusianace, Nov 7, 2011 IP