Getting img childnodes from div nodes in a page

Discussion in 'PHP' started by AHA7, Jun 3, 2007.

  1. #1
    Hello,

    I have HTML pages loaded as DomDocument
    $page = new DomDocument();
    PHP:
    .

    If I want to get all the titles and src's of images in a page, I could do that using the following:

    $page->loadHTMLFile('$page_address');
    $i = 1; // just to number images
    foreach ($page->getElementsByTagName('img') as $image)
    {
       if ( $image->hasAttribute('title') )
       {
          echo "image #: $i \n";
          echo "image title: " . $image->getAttribute('title') . "\n";
          echo "image source: " . $image->getAttribute('src') . "\n";
         $i++;
       }
       echo "<hr>\n";
    }
    PHP:
    But that's not exactly what I want to do. My HTML pages have several div's in them and I want to get all images in each div separately, so that I know how many image-containing div's are there and what titles and src's those images have in each div.

    So, I want to search the document for all <div> nodes (elements), search each <div> node separately (in a loop) for all <img> childnodes and get their title and src attributes.

    An example:

    Here's the HTML page:

    <div>
    ...
    <img title="img-1" src="...">
    ...
    <img title="img-2" src="...">
    </div>
    
    ......
    
    <div>
    ....
    <img title="img-1" src="...">
    ....
    <img title="img-28" src="...">
    </div>
    
    ......
    
    <div>
    ....
    <img title="img-5" src="...">
    ....
    <img title="img-2" src="...">
    </div>
    ......
    Code (markup):
    Here's the output I am looking for by using DomDocument functions:

    image #: 1
    image title: img-1
    image source: ...
    image #: 2
    image title: img-2
    image source: ...
    div ended!
    ---------------------------------
    image #: 1
    image title: img-1
    image source: ...
    image #: 2
    image title: img-28
    image source: ...
    div ended!
    ---------------------------------
    image #: 1
    image title: img-5
    image source: ...
    image #: 2
    image title: img-2
    image source: ...
    div ended!
    --------------------------------
    ......

    I know that this would require 2 loops, the first to loop through the page looking for div nodes and inside that loop another one that loops through each div node and gets the required attributes of all the img childnodes of the current div node in the outer loop...

    Any help would be much appreciated!
     
    AHA7, Jun 3, 2007 IP