PHP DOM - Scrapping table and remove span tag

Discussion in 'PHP' started by maihannijat, Dec 26, 2011.

  1. #1
    I have script which scraps website and looks for table and fetch the data. I want to remove span tags with it's attributes from the scraped content.

    The nature of website content is like this

                       <table width="97%" cellspacing="3" cellpadding="0" border="0" align="center">
    				   <tbody>
    						<tr valign="top">
                                            <td width="15%" align="left" style="padding-left: 05px">
                                                                <span style="font-size:16px;font-weight:bold;" id="ctl00_cphPageContant_lalel1">Job Categories:</span>
    															</td>
    																														<td>
                                                                <span style="font-size:10pt;" id="ctl00_cphPageContant_FunctionalAria">Consultancy</span>
    															</td>
    
    I don't need span style these all. Just the conten.
    
    Here is my dom function code.
    
    [CODE]foreach($html_dom->find('table [cellspacing=3] tr') as $e) {
    		   $children = $e->children() ;
    		   $size = count($children) ;
    		   if($size == 2) {
    				   $tag = $children[0]->children(0);
    				   $value = $children[1]->children(0);;
    				   $createXML .= createRSSFile($tag);
    		   }
       }
    Code (markup):
    </tr>[/CODE]
     
    maihannijat, Dec 26, 2011 IP
  2. WPC

    WPC Peon

    Messages:
    116
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #2
    Just use 'preg_replace', and replace <span> and </span> with "" nothing.
     
    WPC, Dec 26, 2011 IP
  3. maihannijat

    maihannijat Member

    Messages:
    48
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    36
    #3
    Thank you. I am close to result.

    Here is what It returned without span tags. wow

    <?xml version="1.0" encoding="UTF-8"?><rss version="2.0"><detail><title>Job List</title><link>http://acbar.org/</link><description>Job List</description><lastBuildDate>$latestBuild</lastBuildDate><language>en</language><item><sourceurl></sourceurl><jobid>1712</jobid><job_categories></job_categories><minimum_education_level></minimum_education_level><></><vacancy_number></vacancy_number><></><position></position><></><organization></organization><></><duty_station></duty_station><></><city></city><></><duration></duration><></><gender></gender><></><salary_range></salary_range><></><announcing_date></announcing_date><></><closing_date></closing_date><nationality></nationality><no_of_jobs></no_of_jobs><></><job_type></job_type><></><shift></shift><></><job_status></job_status><></><experience></experience><></><></><></><duties_and_responsibilities></duties_and_responsibilities><></><qualifications></qualifications><></><submission_guideline></submission_guideline><></><></></item></detail></rss><?/xml>
    Code (markup):
    Now I am getting some empty tags
    <></>
    Code (markup):
    I am sure this happens cause of following code.
    $tag= is <>
    $value= is submission guideline
    $returnITEM = "<".$tag.">".htmlspecialchars(str_replace("br","<br/>",$value))."</".$tag.">";
    Code (markup):
     
    maihannijat, Dec 26, 2011 IP
  4. maihannijat

    maihannijat Member

    Messages:
    48
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    36
    #4
    I found that there are some empty fields in the table of source, that is why I am returning with above empty tags.
    Any solution to avoid empty tags?
     
    maihannijat, Dec 26, 2011 IP