scraping the data from website

Discussion in 'PHP' started by mark103, Apr 15, 2013.

  1. #1
    Hi,

    I am having a problem with scraping the data from the website. I can't be able to output the data to my php after I have scraping the data from the website. On my php it show as a empty page.

    here is the html source I want to scrape:

    
    
    <span id="row3Time" class="zc-ssl-pg-time">11:00 AM</span>
    <a id="rowTitle3" class="zc-ssl-pg-title" href='http://tvlistings.zap2it.com/tv/sportscenter/EP00019917'>SportsCenter</a>
    <ul class="zc-icons">
    <li class="zc-ic zc-ic-span"><span class="zc-ic-live">LIVE</span></li></ul>
    </li>
    <li class="zc-ssl-pg" id="row1-4" style="">
    
    <span id="row4Time" class="zc-ssl-pg-time">12:00 PM</span>
    <a id="rowTitle4" class="zc-ssl-pg-title" href='http://tvlistings.zap2it.com/tv/sportscenter/EP00019917'>SportsCenter</a>
    <ul class="zc-icons">
    <li class="zc-ic zc-ic-span"><span class="zc-ic-live">LIVE</span></li></ul>
    </li>
    <li class="zc-ssl-pg" id="row1-5" style="">
    
    <span id="row5Time" class="zc-ssl-pg-time">1:00 PM</span>
    <a id="rowTitle5" class="zc-ssl-pg-title" href='http://tvlistings.zap2it.com/tv/sportscenter/EP00019917'>SportsCenter</a>
    <ul class="zc-icons">
    <li class="zc-ic zc-ic-span"><span class="zc-ic-live">LIVE</span></li></ul>
    
    
    Code (markup):
    here is the php source:

    
    
    <?php
    
    $contents = file_get_contents('http://tvlistings.zap2it.com/tvlistings/ZCSGrid.do?stnNum=10179');
    preg_match('/<a id="rowTitle3" class="zc-ssl-pg-title"[.*]<\/a>/i', $data, $matches);
    $rowtitle = $matches[1];
    echo $rowtitle."<br>\n";
    ?>
    
    PHP:
    And here is the php output:
    <br>
    PHP:
    does anyone know how I can scraping the data from that website using with <a id=rowTitle3 to the end of the page?

    any advice would be much appreicated.

    Thanks in advance
     
    Last edited: Apr 15, 2013
    mark103, Apr 15, 2013 IP
  2. Alex Roxon

    Alex Roxon Active Member

    Messages:
    424
    Likes Received:
    11
    Best Answers:
    7
    Trophy Points:
    80
    #2
    Lol. Just from looking at your code, I can see that:
    1. You're matching your pattern against the wrong string! (you save to $contents and try to match against $data)
    2. You're not escaping special characters in your pattern (=, - etc.)
    3. If you want to grab the string matched against your pattern, you enclose with regular brackets () and not square brackets []
    Without testing, provided the html against which your pattern was created is accurate, the following should work:
    <?php
     
    $contents = file_get_contents('http://tvlistings.zap2it.com/tvlistings/ZCSGrid.do?stnNum=10179');
    preg_match('/<a id\="rowTitle3" class\="zc\-ssl\-pg\-title"(.*)<\/a>/i', $contents, $matches);
    $rowtitle = $matches[1];
    echo$rowtitle."<br>\n";
    PHP:
     
    Alex Roxon, Apr 15, 2013 IP
  3. ds28

    ds28 Greenhorn

    Messages:
    34
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    6
    #3
    try using php library snoopy which simulates browser and It automates the task of retrieving web page content and posting forms
     
    ds28, Apr 16, 2013 IP
  4. ThePHPMaster

    ThePHPMaster Well-Known Member

    Messages:
    737
    Likes Received:
    52
    Best Answers:
    33
    Trophy Points:
    150
    #4
    You do not need to escape the equal sign, and you can get away with not escaping the dash as well in almost all the cases.

    I am assuming you want all the times/status/links for the page, then the following should do it:

    
    // Get times/dates
    preg_match_all('/>(.*)<\/span>/iU', $data, $matches);
    $times = array_chunk($matches[1], 2);
     
    // Get links
    preg_match_all('/href=.(.*).>/iU', $data, $matches);
    foreach ($times as $sub => $play) {
        list($time, $status) = $play;
        echo "Time: $time <br />";
        echo "Status: $status <br />";
        echo "Link: {$matches[1][$sub]} <br /><br />";
    }
    
    PHP:
     
    ThePHPMaster, Apr 16, 2013 IP