DomDocument in a loop

AHA7 Peon

Messages:: 445

Likes Received:: 5

Best Answers:: 0

Trophy Points:: 0

#1

Hello,

I use this thing (I think it's a class ) called DomDocument. Actually, I don't know much about it but it provides many handy functions to get different info from HTML documents such as tags, attributes,...

I want to use the function (together with other dom functions) $domPage->loadHTMLFile($URL[$i]) in a for loop to open a new different HTML page at each iteration of the loop. And my questions are:

1- Can I create the dom document only once outside the loop or I have to create it at each iteration since it opens a new file each time? i.e. can I place the following outside the loop: $domPage = new DomDocument() ?

2- Do I have to delete the DomDocument at each loop and how to?

3- Would the @ be necessary infront of @$domPage->loadHTMLFile($URL[$i])?

AHA7, May 28, 2007 IP

gandaliter Peon

Messages:: 64

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#2

I would think that you could create the $domPage outside the loop and reset it each loop using loadHTMLFile(), but I would normally create an array of DomDocument()s and each iteration load a new one:
for ($n=0;$n++;$n<(number of iterations))
{
$domPage[$n] = new DomDocument();
$domPage[$n]->loadHTMLFile($URL[$i]);
}
PHP:
As for the @, it should work without it, all it does is prevent any errors from being displayed.

Hope this helps.

gandaliter, May 29, 2007 IP

gandaliter Peon

Messages:: 64

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#3

Oops, it was only a few lines of code, but I still manage to get it wrong, sorry, try this.
for ($n=0;$n<(number of iterations);$n++){$domPage[$n] = new DomDocument();$domPage[$n]->loadHTMLFile($URL[$i]);}
PHP:

gandaliter, May 29, 2007 IP

AHA7 Peon

Messages:: 445

Likes Received:: 5

Best Answers:: 0

Trophy Points:: 0

#4

Thanks gandaliter!

OK, here's another problem:

Say that I have 100 HTML pages and from these I want to extract all the src and title values of the img tags that do have both src and title attributes. There are two ways to do this in a for loop:

1- Using cURL to load each page and then match the tags using regex and extract what I want. or

2- Using DomDocument and its functions to load each page and get what I want as follows:
$dom = new DomDocument();
$dom->loadHTMLFile($url);
foreach ($dom->getElementsByTagName('img') as $image)
	{
	  if ($image->hasAttribute('src') && $image->hasAttribute('title'))
	  {
	     $image->getAttribute('src');
	     $image->getAttribute('title');
	  }
	}
PHP:
Which method is better in terms of performance and speed?

Note that the first method would require making a new cURL connection at each iteration of the loop to load the page then use regex matching while the second method would require creating a new dom document at each iteration and use its functions.

AHA7, May 29, 2007 IP

gandaliter Peon

Messages:: 64

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#5

I would probabally use Curl. At a guess I would imagine that it would be more efficient, and it would probabally have more support/tutorials etc. Just a guess

gandaliter, May 31, 2007 IP

Log in or Sign up

DomDocument in a loop

AHA7 Peon

gandaliter Peon

gandaliter Peon

AHA7 Peon

gandaliter Peon

Useful Searches