DOMDocument of class attribute to pick up innerhtml of an html file

gilgalbiblewheel Well-Known Member

Messages:: 435

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 101

#1

How do I use the attribute class to pick up the innerhtml at a given url and html file? I've been looking all over the internet for a clear explanation.

Where am I going wrong (I'm not used to the "->" and "=>" since I don't know what they represent or do):


<?php
//should come back to here
function walkDOMForTagAndClass($element, $tagName, $class, $callback) {
   if ($element->nodeType !== 1) return false; // invalid element
   // we force case as XML vs. SGML are inconsistent on ths
   $tagName = strtoupper($tagName);
   if ($walk = $element->firstChild) do {
     if (
       ($walk->nodeType == 1) &&
       (strtoupper($walk->nodeName) == $tagName) &&
       ($walk->attributes->getNamedItem('class') == $class)
     ) $callback($walk);
   } while (
     $walk = $walk->firstChild || $walk->nextSibling || (
       $walk->parentNode == $element ? false : $walk->parentNode.nextSibling
     )
   );
}
$file = "https://www.blueletterbible.org/lang/lexicon/lexicon.cfm?Strongs=H1&t=KJV";
$doc = new DOMDocument();
$doc->loadHTMLFile($file);
walkDOMForTagAndClass(
   $doc,
   'div',
   //'columns tablet-8 small-10 tablet-order-3 small-order-2',
   'nocrumbs',
   function($file) {
     // do whatever it is you want with the matches here.
   }
);



/*$html = "https://www.blueletterbible.org/lang/lexicon/lexicon.cfm?Strongs=H1&t=KJV";

$dom = new DOMDocument();
$dom->loadHTML($html);*/

//Evaluate Anchor tag in HTML
$xpath = new DOMXPath($doc);
$hrefs = $xpath->evaluate("/html/body//a");

for ($i = 0; $i < $hrefs->length; $i++) {
  $href = $hrefs->item($i);
  $url = $href->getAttribute('href');

  //remove and set target attribute  
  $href->removeAttribute('target');
  $href->setAttribute("target", "_blank");

  $newURL=$url."/newurl";

  //remove and set href attribute  
  $href->removeAttribute('href');
  $href->setAttribute("href", $newURL);
}

// save html
$file=$doc->saveHTML();

echo $file;
?>

Code (markup):

gilgalbiblewheel, Jul 7, 2018 IP

NetStar Notable Member

Messages:: 2,471

Likes Received:: 541

Best Answers:: 21

Trophy Points:: 245

#2

You're probably not having much luck finding the answer to your question on google because you don't understand what to ask in the first place. With an understanding of HTML and a basic understanding of PHP it's easy to use DomDocument. However, if you have no idea about how to use PHP objects and access methods and properties it's going to be a bit tedious. Did you even bother to search "What is -> in php"? That would be a good start to understand what's going on.

It's hard to answer your question since you aren't even sure of what you are asking for or the code you are posting. So I'm assuming you want to use DomDocument to extract 1. the URL and 2. the linked text between the tags. Right? Here's an example of how to extract all links on craigslist:
<?php

$url = "http://www.craigslist.org";

$htmlParser = new domDocument();
$htmlParser->preserveWhiteSpace = false;
@$htmlParser->loadHTML(file_get_contents($url));

foreach ($htmlParser->getElementsByTagName('a') as $aTag)
{
    $linkSrc  = $aTag->getAttribute('src');
    $linkText = $aTag->childNodes->item(0)->nodeValue;

    echo "<a href=\"" . $linkSrc . "\">" . $linkText . "</a>\n";
}
Code (markup):

NetStar, Jul 8, 2018 IP

gilgalbiblewheel Well-Known Member

Messages:: 435

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 101

#3

NetStar said: ↑
You're probably not having much luck finding the answer to your question on google because you don't understand what to ask in the first place. With an understanding of HTML and a basic understanding of PHP it's easy to use DomDocument. However, if you have no idea about how to use PHP objects and access methods and properties it's going to be a bit tedious. Did you even bother to search "What is -> in php"? That would be a good start to understand what's going on.

It's hard to answer your question since you aren't even sure of what you are asking for or the code you are posting. So I'm assuming you want to use DomDocument to extract 1. the URL and 2. the linked text between the tags. Right? Here's an example of how to extract all links on craigslist:
<?php

$url = "http://www.craigslist.org";

$htmlParser = new domDocument();
$htmlParser->preserveWhiteSpace = false;
@$htmlParser->loadHTML(file_get_contents($url));

foreach ($htmlParser->getElementsByTagName('a') as $aTag)
{
    $linkSrc  = $aTag->getAttribute('src');
    $linkText = $aTag->childNodes->item(0)->nodeValue;

    echo "<a href=\"" . $linkSrc . "\">" . $linkText . "</a>\n";
}
Code (markup):
Click to expand...
Thanks for the advice. I asked about -> and => in a new post.

My coding skills are pretty basic with for loops, if statements and regex/preg_match_all. I'm not used to functions that much and the DOMDocument are quite new to me. But I want to extract certain things from the Strong's pages to insert in my db table. The hard part is to pick up the new Strong's numbers in the definitions' section and pick up those definitions as well. And then repeat the cycle by looking into their Strong's numbers found in their definitions until there is no Strong's number to pick up. In this case I want to move to the next Strong's page. The Strong's numbers are from H0-H6090.

gilgalbiblewheel, Jul 8, 2018 IP

Log in or Sign up

DOMDocument of class attribute to pick up innerhtml of an html file

gilgalbiblewheel Well-Known Member

NetStar Notable Member

gilgalbiblewheel Well-Known Member

Useful Searches