1. Advertising
    y u no do it?

    Advertising (learn more)

    Advertise virtually anything here, with CPM banner ads, CPM email ads and CPC contextual links. You can target relevant areas of the site and show ads based on geographical location of the user if you wish.

    Starts at just $1 per CPM or $0.10 per CPC.

DOMDocument of class attribute to pick up innerhtml of an html file

Discussion in 'PHP' started by gilgalbiblewheel, Jul 7, 2018.

  1. #1
    How do I use the attribute class to pick up the innerhtml at a given url and html file? I've been looking all over the internet for a clear explanation.

    Where am I going wrong (I'm not used to the "->" and "=>" since I don't know what they represent or do):
    
    <?php
    //should come back to here
    function walkDOMForTagAndClass($element, $tagName, $class, $callback) {
       if ($element->nodeType !== 1) return false; // invalid element
       // we force case as XML vs. SGML are inconsistent on ths
       $tagName = strtoupper($tagName);
       if ($walk = $element->firstChild) do {
         if (
           ($walk->nodeType == 1) &&
           (strtoupper($walk->nodeName) == $tagName) &&
           ($walk->attributes->getNamedItem('class') == $class)
         ) $callback($walk);
       } while (
         $walk = $walk->firstChild || $walk->nextSibling || (
           $walk->parentNode == $element ? false : $walk->parentNode.nextSibling
         )
       );
    }
    $file = "https://www.blueletterbible.org/lang/lexicon/lexicon.cfm?Strongs=H1&t=KJV";
    $doc = new DOMDocument();
    $doc->loadHTMLFile($file);
    walkDOMForTagAndClass(
       $doc,
       'div',
       //'columns tablet-8 small-10 tablet-order-3 small-order-2',
       'nocrumbs',
       function($file) {
         // do whatever it is you want with the matches here.
       }
    );
    
    
    
    /*$html = "https://www.blueletterbible.org/lang/lexicon/lexicon.cfm?Strongs=H1&t=KJV";
    
    $dom = new DOMDocument();
    $dom->loadHTML($html);*/
    
    //Evaluate Anchor tag in HTML
    $xpath = new DOMXPath($doc);
    $hrefs = $xpath->evaluate("/html/body//a");
    
    for ($i = 0; $i < $hrefs->length; $i++) {
      $href = $hrefs->item($i);
      $url = $href->getAttribute('href');
    
      //remove and set target attribute  
      $href->removeAttribute('target');
      $href->setAttribute("target", "_blank");
    
      $newURL=$url."/newurl";
    
      //remove and set href attribute  
      $href->removeAttribute('href');
      $href->setAttribute("href", $newURL);
    }
    
    // save html
    $file=$doc->saveHTML();
    
    echo $file;
    ?>
    
    Code (markup):

     
    gilgalbiblewheel, Jul 7, 2018 IP
  2. NetStar

    NetStar Notable Member

    Messages:
    2,471
    Likes Received:
    541
    Best Answers:
    21
    Trophy Points:
    245
    #2
    You're probably not having much luck finding the answer to your question on google because you don't understand what to ask in the first place. With an understanding of HTML and a basic understanding of PHP it's easy to use DomDocument. However, if you have no idea about how to use PHP objects and access methods and properties it's going to be a bit tedious. Did you even bother to search "What is -> in php"? That would be a good start to understand what's going on.

    It's hard to answer your question since you aren't even sure of what you are asking for or the code you are posting. So I'm assuming you want to use DomDocument to extract 1. the URL and 2. the linked text between the tags. Right? Here's an example of how to extract all links on craigslist:

    
    <?php
    
    $url = "http://www.craigslist.org";
    
    $htmlParser = new domDocument();
    $htmlParser->preserveWhiteSpace = false;
    @$htmlParser->loadHTML(file_get_contents($url));
    
    foreach ($htmlParser->getElementsByTagName('a') as $aTag)
    {
        $linkSrc  = $aTag->getAttribute('src');
        $linkText = $aTag->childNodes->item(0)->nodeValue;
    
        echo "<a href=\"" . $linkSrc . "\">" . $linkText . "</a>\n";
    }
    
    
    Code (markup):
     
    NetStar, Jul 8, 2018 IP
  3. gilgalbiblewheel

    gilgalbiblewheel Well-Known Member

    Messages:
    435
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    101
    #3
    Thanks for the advice. I asked about -> and => in a new post.

    My coding skills are pretty basic with for loops, if statements and regex/preg_match_all. I'm not used to functions that much and the DOMDocument are quite new to me. But I want to extract certain things from the Strong's pages to insert in my db table. The hard part is to pick up the new Strong's numbers in the definitions' section and pick up those definitions as well. And then repeat the cycle by looking into their Strong's numbers found in their definitions until there is no Strong's number to pick up. In this case I want to move to the next Strong's page. The Strong's numbers are from H0-H6090.
     
    gilgalbiblewheel, Jul 8, 2018 IP