Hi, I am having a problem with scraping the data from the website. I can't be able to output the data to my php after I have scraping the data from the website. On my php it show as a empty page. here is the html source I want to scrape: <span id="row3Time" class="zc-ssl-pg-time">11:00 AM</span> <a id="rowTitle3" class="zc-ssl-pg-title" href='http://tvlistings.zap2it.com/tv/sportscenter/EP00019917'>SportsCenter</a> <ul class="zc-icons"> <li class="zc-ic zc-ic-span"><span class="zc-ic-live">LIVE</span></li></ul> </li> <li class="zc-ssl-pg" id="row1-4" style=""> <span id="row4Time" class="zc-ssl-pg-time">12:00 PM</span> <a id="rowTitle4" class="zc-ssl-pg-title" href='http://tvlistings.zap2it.com/tv/sportscenter/EP00019917'>SportsCenter</a> <ul class="zc-icons"> <li class="zc-ic zc-ic-span"><span class="zc-ic-live">LIVE</span></li></ul> </li> <li class="zc-ssl-pg" id="row1-5" style=""> <span id="row5Time" class="zc-ssl-pg-time">1:00 PM</span> <a id="rowTitle5" class="zc-ssl-pg-title" href='http://tvlistings.zap2it.com/tv/sportscenter/EP00019917'>SportsCenter</a> <ul class="zc-icons"> <li class="zc-ic zc-ic-span"><span class="zc-ic-live">LIVE</span></li></ul> Code (markup): here is the php source: <?php $contents = file_get_contents('http://tvlistings.zap2it.com/tvlistings/ZCSGrid.do?stnNum=10179'); preg_match('/<a id="rowTitle3" class="zc-ssl-pg-title"[.*]<\/a>/i', $data, $matches); $rowtitle = $matches[1]; echo $rowtitle."<br>\n"; ?> PHP: And here is the php output: <br> PHP: does anyone know how I can scraping the data from that website using with <a id=rowTitle3 to the end of the page? any advice would be much appreicated. Thanks in advance
Lol. Just from looking at your code, I can see that: You're matching your pattern against the wrong string! (you save to $contents and try to match against $data) You're not escaping special characters in your pattern (=, - etc.) If you want to grab the string matched against your pattern, you enclose with regular brackets () and not square brackets [] Without testing, provided the html against which your pattern was created is accurate, the following should work: <?php $contents = file_get_contents('http://tvlistings.zap2it.com/tvlistings/ZCSGrid.do?stnNum=10179'); preg_match('/<a id\="rowTitle3" class\="zc\-ssl\-pg\-title"(.*)<\/a>/i', $contents, $matches); $rowtitle = $matches[1]; echo$rowtitle."<br>\n"; PHP:
try using php library snoopy which simulates browser and It automates the task of retrieving web page content and posting forms
You do not need to escape the equal sign, and you can get away with not escaping the dash as well in almost all the cases. I am assuming you want all the times/status/links for the page, then the following should do it: // Get times/dates preg_match_all('/>(.*)<\/span>/iU', $data, $matches); $times = array_chunk($matches[1], 2); // Get links preg_match_all('/href=.(.*).>/iU', $data, $matches); foreach ($times as $sub => $play) { list($time, $status) = $play; echo "Time: $time <br />"; echo "Status: $status <br />"; echo "Link: {$matches[1][$sub]} <br /><br />"; } PHP: