I have script which scraps website and looks for table and fetch the data. I want to remove span tags with it's attributes from the scraped content. The nature of website content is like this <table width="97%" cellspacing="3" cellpadding="0" border="0" align="center"> <tbody> <tr valign="top"> <td width="15%" align="left" style="padding-left: 05px"> <span style="font-size:16px;font-weight:bold;" id="ctl00_cphPageContant_lalel1">Job Categories:</span> </td> <td> <span style="font-size:10pt;" id="ctl00_cphPageContant_FunctionalAria">Consultancy</span> </td> I don't need span style these all. Just the conten. Here is my dom function code. [CODE]foreach($html_dom->find('table [cellspacing=3] tr') as $e) { $children = $e->children() ; $size = count($children) ; if($size == 2) { $tag = $children[0]->children(0); $value = $children[1]->children(0);; $createXML .= createRSSFile($tag); } } Code (markup): </tr>[/CODE]
Thank you. I am close to result. Here is what It returned without span tags. wow <?xml version="1.0" encoding="UTF-8"?><rss version="2.0"><detail><title>Job List</title><link>http://acbar.org/</link><description>Job List</description><lastBuildDate>$latestBuild</lastBuildDate><language>en</language><item><sourceurl></sourceurl><jobid>1712</jobid><job_categories></job_categories><minimum_education_level></minimum_education_level><></><vacancy_number></vacancy_number><></><position></position><></><organization></organization><></><duty_station></duty_station><></><city></city><></><duration></duration><></><gender></gender><></><salary_range></salary_range><></><announcing_date></announcing_date><></><closing_date></closing_date><nationality></nationality><no_of_jobs></no_of_jobs><></><job_type></job_type><></><shift></shift><></><job_status></job_status><></><experience></experience><></><></><></><duties_and_responsibilities></duties_and_responsibilities><></><qualifications></qualifications><></><submission_guideline></submission_guideline><></><></></item></detail></rss><?/xml> Code (markup): Now I am getting some empty tags <></> Code (markup): I am sure this happens cause of following code. $tag= is <> $value= is submission guideline $returnITEM = "<".$tag.">".htmlspecialchars(str_replace("br","<br/>",$value))."</".$tag.">"; Code (markup):
I found that there are some empty fields in the table of source, that is why I am returning with above empty tags. Any solution to avoid empty tags?