I got 2 html files which use same template only some fields different and i need to get full xpath to those differences using PHP. 1st) <html><body><divclass="price">12,400</div><divclass="make">Acura</div> 2nd) <html><body><divclass="price">15,400</div><divclass="make">Bmw</div> So as you can see from example its the same template but price is different and make So PHP script suppose to show xpath (those results): //div[@class='price'] //div[@class='make'] Script needs to find difference in 2 files and get xpath to that difference, obviously template is unknown and every time could be different Any Help Appreciated!!!
I'm not sure exactly what you're trying to do. What do you mean by xpath? That's an xml term. Is the html generated from php? PHP can't directly read the html on a page. It is executed before the page is rendered and is usually used to write the html. It's easy to do in Javascript because Javascript can access the html (dom object). Can you tell us a bit more about what you're trying to do?
Those html files are not generated from PHP they are static files. In any language including PHP you can get to any dom object by using xpath loading htmk into $dom then creating DomXpath and then you can access any node with xpath queries. So PHP script needs to find what is the dynamic part between those 2 html files (in example above dynamic parts are price value and make of vehicle), and then i need to get xpath to that dynamic content.
Not familiar with the functions (and too lazy to run after the docs right now), but if the content is read into the container-variable ($dom in this case) as an array (or you can make it do that), you could just read each file into separate arrays and do something like: function arrayDiff($A, $B) { $intersect = array_intersect($A, $B); return array_merge(array_diff($A, $intersect), array_diff($B, $intersect)); } Code (markup): This will give you an array of the differences (the non-matching elements), which can then be parsed to get the Xpath. Might be too complex for what you need, but it's at least one way to go about it.
Oh ok, static files & php xml. You'd have to find the differences first and get an array of each string that is different. Then you can query for those in xml. To get different phrases you have to extract all the text from the html, into an array. The only other way I can see to do it is compare individual words which means you lose the phrases. So, I've written some code to extract the 'phrases' from the html. I think the xpath query is correct but I can't get it to return the actual path. I've read twice that it can't be done. If you can do it please let me know how you did it. <?php $strConstant = file_get_contents("Test1.htm"); $strVariable = file_get_contents("Test2.htm"); $arXPaths = getXPaths($strVariable, getDiffArray($strConstant, $strVariable)); foreach($arXPaths as $value) { echo $value . "<br/>"; } function getXPaths($strVariable, $arDiff) { $arXPaths = array(); $doc = new DOMDocument(); $doc->loadXML($strVariable); if(empty($arDiff) || !is_array($arDiff)) return false; foreach($arDiff as $strDiff) { $query = "//*[text()[contains(.,'" . $strDiff . "')]]"; $xpathvar = new Domxpath($doc); $queryResult = $xpathvar->query($query); foreach($queryResult as $node) { $arXPaths[] = $node->getNodePath(); //this isn't correct } } return $arXPaths; } function getDiffArray($strConstant, $strVariable){ $arDiff = array(); $arConstant = getElemTextArray($strConstant); $arVariable = getElemTextArray($strVariable); $diff = diff($arConstant, $arVariable); if(is_array($diff)) { foreach($diff as $k){ if(is_array($k)) { if(!empty($k['i'])) { foreach($k['i'] as $key => $value) { $arDiff[] = $value; } } } } } return $arDiff; } function diff($old, $new){ /* (C) Paul Butler 2007 <http://www.paulbutler.org/> May be used and distributed under the zlib/libpng license. */ $matrix = array(); $maxlen = 0; foreach($old as $oindex => $ovalue){ $nkeys = array_keys($new, $ovalue); foreach($nkeys as $nindex){ $matrix[$oindex][$nindex] = isset($matrix[$oindex - 1][$nindex - 1]) ? $matrix[$oindex - 1][$nindex - 1] + 1 : 1; if($matrix[$oindex][$nindex] > $maxlen){ $maxlen = $matrix[$oindex][$nindex]; $omax = $oindex + 1 - $maxlen; $nmax = $nindex + 1 - $maxlen; } } } if($maxlen == 0) return array(array('d'=>$old, 'i'=>$new)); return array_merge( diff(array_slice($old, 0, $omax), array_slice($new, 0, $nmax)), array_slice($new, $nmax, $maxlen), diff(array_slice($old, $omax + $maxlen), array_slice($new, $nmax + $maxlen))); } function getElemTextArray($html) { $arTexts = array(); $nLastIdx = 0; $bBreak = false; $reg = "/(?<=>)\s*(?=<)|(?<=>)\n*([^<]+)/"; if (preg_match_all($reg, $html, $arMatches)) { foreach($arMatches as $key => $value) { foreach($value as $key1 => $value1) { if(empty(trim($value1))) continue; if($key1 < $nLastIdx) { $bBreak = true; break; } $nLastIdx = $key1; $arTexts[] = $value1; } if($bBreak) break; } } return $arTexts; } ?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> <title>Untitled Document</title> </head> <body> </body> </html> Code (markup):
You can't, at least not without developing some sort of intelligent detection (pretty sure it is not worth it for you). Every time the HTML changes you would need to re-work it.
You just need a constant to compare each file with. What I wrote expects the constant to have the same html as the variable. It looks for text that's different inside each element. If they're 2 completely different files the only way to do it is with a word by word comparison. In that case a diff would pretty much be meaningless anyway. It wouldn't matter if the template changes. You just need to update the constant so the html (not necessarily the text) for the constant & variable are the same.
@SoftLink: You're on the right track First of compile a list of xpaths that contain text from doc1 then compare it to doc2.