I need to get from bellow html code this -> 110.234.71.142:8080 <td class="leftborder timestamp" rel="1342078386"><span class="updatets "> 18 secs</span></td> <td><span><style> .wef6{display:none} .N6zc{display:inline} .ANrs{display:none} .qQPY{display:inline} .Cory{display:none} .Jgqn{display:inline} </style><span class="N6zc">110</span><span style="display:none">180</span><span class="ANrs">13</span><div style="display:none">176</div>.<span class="N6zc">234</span><span class="Cory">7</span>.<span style="display: inline">71</span><span class="wef6">63</span>.<span class="227">142</span></span></td> <td> 8080</td> Code (markup):
Why not do it using DOMDocument. Try following. libxml_use_internal_errors(TRUE); $dom = new DOMDocument(); $dom->loadHTML($code); $xml = simplexml_import_dom($dom); libxml_use_internal_errors(FALSE); foreach($xml->xpath("//span") as $item){ echo (string)$item . PHP_EOL; } PHP:
why not remove first all between <style></style> and then remove all <?> you can use stripos to find </style> php.net/strip_tags is also nice!
It looks like it has some false IP addresses too. Hmm can you show me where this code is being displayed, if possible? I may be able to assist you with something to parse it properly.
1)<span[\s]class="N6zc">([0-9]+).*?<span[\s]class="N6zc">([0-9]+).*?="display:[\s]inline">([0-9]+).*?</span>.<span[\s]class=.*?>([0-9]+) 2)([0-9]+)</td> 3)above are the two regex u can use to extract ip address(the first one ) and port number (the second one). then u can combine all the numbers.
ok it looks like you are doing some scraping there. the problem I see ( not sure if I am right ) is that the css classes in the <style></style> tags are generated at random. If you want a script that is handling random classes then look at this one. might be a bit long but I commented every step and made it simple to understand. <?php $code = '<td class="leftborder timestamp" rel="1342078386"><span class="updatets ">18 secs</span></td> <td><span><style>.wef6{display:none}.N6zc{display:inline}.ANrs{display:none}.qQPY{display:inline}.Cory{display:none}.Jgqn{display:inline}</style><span class="N6zc">110</span><span style="display:none">180</span><span class="ANrs">13</span><div style="display:none">176</div>.<span class="N6zc">234</span><span class="Cory">7</span>.<span style="display: inline">71</span><span class="wef6">63</span>.<span class="227">142</span></span></td> <td>8080</td>'; // remove all line feeds $code = str_replace("\n",'',$code); // get the inline styles preg_match_all('|<style>(.*?)</style>|',$code,$arr); // get each style rule $parts = explode('.',$arr[1][0]); // ignore first one as empty unset($parts[0]); // delete style from $code $code = str_replace($arr[0][0],'',$code); // loop through all style rules foreach ($parts as $part) { // get what display the rule is preg_match('|\{(display:.*)\}|',$part,$style); // get style class $class = substr($part,0,4); // change class to style arrtibute on span elements $code = str_replace('class="'.$class.'"','style="'.$style[1].'"',$code); } // check if there are style any classes left. preg_match_all('|span (class=".*?")|',$code,$arr); // ignore first one again. unset($arr[0]); // loop through all left over classes foreach ($arr as $part) { // change all left over classes it display:inline as there is no other rule defined for them. $code = str_replace($part[1],'style="display:inline"',$code); } // get all inline spans preg_match_all('|<span style="display:\s*inline">(.*?)</span>|',$code,$arr); // join them with a . $ip = implode('.',$arr[1]); // get port number preg_match('|</td> <td>([0-9]*)</td>|',$code,$arr); $port = $arr[1]; echo $ip.':'.$port; ?> PHP: Hope this helps you.