Hy. I have a script that crawl the page and extract some informtaion from there, my problem is that I can no do more then 10 pages per running because I will get and error, If i remember right is a timeout error. I try to increase the timeout of the procedure, even the program but no luck. The idea is that I use PHPCurl to send a url and the extract some information from this url, and the page number I give it in the url. any idea, I'm in a hurry so no code or image posting but if you need I will be happy to post it later. Thank you
You are probably correct that your script is stopping because of the timeout error. If your site relies on a shared host, you won't be able to change the time limit because shared hosts cannot afford to have scripts that run for a long time because it slows down all of the other sites on the same server. If you can run cron jobs, you should be able to re-write your script so that it can be run periodically and send it's results to a data file that you can read with another script as needed. Good luck!
Just in case your not on shared hosting, the time out code is: Put at the top of your script set_time_limit(0); PHP:
Most shared hosting deals that I've encountered do not restrict set_time_out(0); Code (markup): But just in case, you should be able to create php.ini (or edit if one already exists) IN THE SAME FOLDER as your script, in there add this line: max_execution_time = 3600 Code (markup): no semi colons or anything, by the way this is for 1 hour (60*60 = 3600 seconds), you can change that to longer if you need it...
If you're outputting stuff from the script try to flush the output buffer after each iteration with flush(), it's a quick and dirty way to do things but I've managed to get around timeouts by using it.
Add this in the top of your script. ini_set('max_execution_time', '3600'); This should allow your script to run for a whole hour withour giving timeout error.
ini_set('max_execution_time', '3600'); I have already try this but is does not work. I think it is because what rainborick said that I'm on shared hosting and the machine stop me, and I thing that ini_set does not work because I do not have require privilege. I try to use it out side the for in inside, and nothing... I will try to test the flush() and if not I thing I will go Pearl. Thank you very much.
There used to be a project called phpfork which allowed you to run threaded processes. Don't know its current status or alternatives. What's really going on is that indexing isn't the job for a webpage really. You need a cron starting up every minute (for example) to take a link from a table, retrieve the page, parse it and add new links back into the table and then update the original links status to "crawled". If it hasn't timed out yet it can take another link... Then if the script times out another will start in a minute and continue processing. The table should grow much faster than you can process so its up to you how many processes you want running and what the server capacity is. You may need to throttle it or risk the wrath of your hosting company. There is a reason Hostgator doesn't like OpenX installs and thats the server load. A spider would be many times more demanding than a busy OpenX install.