I have a page that does many regexes with large files and it's not even ready but it's giving me the 60 seconds time out. Is there a good way of avoiding this? Like I see on many websites with a small gif indicating "Loading" until the script finishes. Something to keep it alive until it's done?
I can't do that on a hosting server. I need a solution that could manage and split the process...something similar.
Hey, You can do it several ways.. If the hosting server does not have safe mode enabled, you should be able to use ini_set(max_execution_time, ###); Otherwise, what you can do, is have the script open the file, and read 100 lines... pause, then reload and read the next 100 lines... Pm me if you want some more detail
That will be interesting. Hard as well I'm sure. I have a set of 10 textboxes with 10 url's and my page should extract their titles and metas and lots of other details using regex and it timeouts. I really would like to know if it's possible to run this job with reloads.
60 seconds for extracting data from 10 pages? If you haven't already, I'd suggest optimizing your regexes... If you know you're going to hit the time limit, you can set it to only process X pages at a time - depending on what you're doing, either storing your progress in the query string or using sessions. Alternatively you could use ticks to run a function periodically that checks if you're close to the timeout and if so, saves your progress and forces a page refresh. [url="http://uk2.php.net/manual/en/control-structures.declare.php#control-structures.declare.ticks]See manual[/url].
http://www.php.net/get_meta_tags get_meta_tags... OR only FREAD the first 1kb, or 500 bytes which should be enough... Edit: I use a regex to check over 8300 websites in about 60 seconds..
if ( preg_match ( '/<title>([^<]+)<\/title>/i', addSpecialChars ( $row->page_source ), $title ) ) { return trim ( $title[1] ); } PHP: This is the only regex used yet. To lower the time taken I'm now saving the sources to db and then performing regex extractions using the stored data and it still timeouts. Regex is very slow and when called more than 20 times in various functions and loops it crashes. Damn this is hard. When I eliminate this regex and set the result to some default it finishes in no time so I'm sure this is the buggy one/.
I wil use only the half kb for the title but what happends when I'm going to expand this and want to extract image alt attributes, metadata, h1, h2, h3, links, keyword occurences and more stuff... This is a project for school and I need to extract anything from 10 pages.
I don't know. My web app will have to work with 10 pages that I won't chose. Someone else. Right now I'm playing with some that have between 20 and 40kb
The regex looks fine - I can't think why you'd have any problems with it. Are you sure it's not just getting stuck in a loop somewhere? What does the addSpecialChars function do, or rather, why does it need to be done to the subject for the regex?
Well....the pages are first saved using "htmlspecialchars" and then they need to be reverted using "addSpecialChars" in order for the regex to match the titles. Here's the function itself...nothing much... function addSpecialChars($string, $noQuotes = FALSE) { $string = eregi_replace("&","&",$string); if (!$noQuotes) $string = eregi_replace("'","'",$string); $string = eregi_replace('"','"',$string); $string = eregi_replace(' ',' ',$string); $string = eregi_replace('<','<',$string); $string = eregi_replace('>','>',$string); return $string; } PHP:
If possible, you should consider running the script locally on your own computer and then just upload the results to your server.
The string functions will always be faster than regex (and even then, preg is usually quicker than ereg). If you don't need to use regex, avoid the regex functions. You could save a bit of time by using str_replace instead for your addSpecialChars function. And assuming you're saving the result of addSpecialChars (rather than running it again and again for every preg_match), I can't see any reason why you'd need anywhere near 60 seconds for processing 10 pages..
addspecialchars was eating it. I slipped me. I was focusing on the title regex and never knew the bug was somewhere else. It was working well untill confrunted with something bigger in various loops. Now it's ok...let's hope it will stay this way until I finish it.