Hello, I use this thing (I think it's a class ) called DomDocument. Actually, I don't know much about it but it provides many handy functions to get different info from HTML documents such as tags, attributes,... I want to use the function (together with other dom functions) $domPage->loadHTMLFile($URL[$i]) in a for loop to open a new different HTML page at each iteration of the loop. And my questions are: 1- Can I create the dom document only once outside the loop or I have to create it at each iteration since it opens a new file each time? i.e. can I place the following outside the loop: $domPage = new DomDocument() ? 2- Do I have to delete the DomDocument at each loop and how to? 3- Would the @ be necessary infront of @$domPage->loadHTMLFile($URL[$i])?
I would think that you could create the $domPage outside the loop and reset it each loop using loadHTMLFile(), but I would normally create an array of DomDocument()s and each iteration load a new one: for ($n=0;$n++;$n<(number of iterations)) { $domPage[$n] = new DomDocument(); $domPage[$n]->loadHTMLFile($URL[$i]); } PHP: As for the @, it should work without it, all it does is prevent any errors from being displayed. Hope this helps.
Oops, it was only a few lines of code, but I still manage to get it wrong, sorry, try this. for ($n=0;$n<(number of iterations);$n++){$domPage[$n] = new DomDocument();$domPage[$n]->loadHTMLFile($URL[$i]);} PHP:
Thanks gandaliter! OK, here's another problem: Say that I have 100 HTML pages and from these I want to extract all the src and title values of the img tags that do have both src and title attributes. There are two ways to do this in a for loop: 1- Using cURL to load each page and then match the tags using regex and extract what I want. or 2- Using DomDocument and its functions to load each page and get what I want as follows: $dom = new DomDocument(); $dom->loadHTMLFile($url); foreach ($dom->getElementsByTagName('img') as $image) { if ($image->hasAttribute('src') && $image->hasAttribute('title')) { $image->getAttribute('src'); $image->getAttribute('title'); } } PHP: Which method is better in terms of performance and speed? Note that the first method would require making a new cURL connection at each iteration of the loop to load the page then use regex matching while the second method would require creating a new dom document at each iteration and use its functions.
I would probabally use Curl. At a guess I would imagine that it would be more efficient, and it would probabally have more support/tutorials etc. Just a guess