DomDocument in a loop

Discussion in 'PHP' started by AHA7, May 28, 2007.

  1. #1
    Hello,

    I use this thing (I think it's a class :D ) called DomDocument. Actually, I don't know much about it but it provides many handy functions to get different info from HTML documents such as tags, attributes,...

    I want to use the function (together with other dom functions) $domPage->loadHTMLFile($URL[$i]) in a for loop to open a new different HTML page at each iteration of the loop. And my questions are:

    1- Can I create the dom document only once outside the loop or I have to create it at each iteration since it opens a new file each time? i.e. can I place the following outside the loop: $domPage = new DomDocument() ?

    2- Do I have to delete the DomDocument at each loop and how to?

    3- Would the @ be necessary infront of @$domPage->loadHTMLFile($URL[$i])?
     
    AHA7, May 28, 2007 IP
  2. gandaliter

    gandaliter Peon

    Messages:
    64
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #2
    I would think that you could create the $domPage outside the loop and reset it each loop using loadHTMLFile(), but I would normally create an array of DomDocument()s and each iteration load a new one:

    for ($n=0;$n++;$n<(number of iterations))
    {
    $domPage[$n] = new DomDocument();
    $domPage[$n]->loadHTMLFile($URL[$i]);
    }
    PHP:
    As for the @, it should work without it, all it does is prevent any errors from being displayed.

    Hope this helps.
     
    gandaliter, May 29, 2007 IP
  3. gandaliter

    gandaliter Peon

    Messages:
    64
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #3
    Oops, it was only a few lines of code, but I still manage to get it wrong, sorry, try this.

    for ($n=0;$n<(number of iterations);$n++){$domPage[$n] = new DomDocument();$domPage[$n]->loadHTMLFile($URL[$i]);}
    PHP:
     
    gandaliter, May 29, 2007 IP
  4. AHA7

    AHA7 Peon

    Messages:
    445
    Likes Received:
    5
    Best Answers:
    0
    Trophy Points:
    0
    #4
    Thanks gandaliter!

    OK, here's another problem:

    Say that I have 100 HTML pages and from these I want to extract all the src and title values of the img tags that do have both src and title attributes. There are two ways to do this in a for loop:

    1- Using cURL to load each page and then match the tags using regex and extract what I want. or

    2- Using DomDocument and its functions to load each page and get what I want as follows:

    $dom = new DomDocument();
    $dom->loadHTMLFile($url);
    foreach ($dom->getElementsByTagName('img') as $image)
    	{
    	  if ($image->hasAttribute('src') && $image->hasAttribute('title'))
    	  {
    	     $image->getAttribute('src');
    	     $image->getAttribute('title');
    	  }
    	}
    PHP:
    Which method is better in terms of performance and speed?

    Note that the first method would require making a new cURL connection at each iteration of the loop to load the page then use regex matching while the second method would require creating a new dom document at each iteration and use its functions.
     
    AHA7, May 29, 2007 IP
  5. gandaliter

    gandaliter Peon

    Messages:
    64
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #5
    I would probabally use Curl. At a guess I would imagine that it would be more efficient, and it would probabally have more support/tutorials etc. Just a guess
     
    gandaliter, May 31, 2007 IP