1. Advertising
    y u no do it?

    Advertising (learn more)

    Advertise virtually anything here, with CPM banner ads, CPM email ads and CPC contextual links. You can target relevant areas of the site and show ads based on geographical location of the user if you wish.

    Starts at just $1 per CPM or $0.10 per CPC.

scraping the data from website

Discussion in 'PHP' started by mark103, Apr 15, 2013.

  1. #1
    Hi,

    I am having a problem with scraping the data from the website. I can't be able to output the data to my php after I have scraping the data from the website. On my php it show as a empty page.

    here is the html source I want to scrape:

    Code (Text):
    1.  
    2.  
    3. <span id="row3Time" class="zc-ssl-pg-time">11:00 AM</span>
    4. <a id="rowTitle3" class="zc-ssl-pg-title" href='http://tvlistings.zap2it.com/tv/sportscenter/EP00019917'>SportsCenter</a>
    5. <ul class="zc-icons">
    6. <li class="zc-ic zc-ic-span"><span class="zc-ic-live">LIVE</span></li></ul>
    7. </li>
    8. <li class="zc-ssl-pg" id="row1-4" style="">
    9.  
    10. <span id="row4Time" class="zc-ssl-pg-time">12:00 PM</span>
    11. <a id="rowTitle4" class="zc-ssl-pg-title" href='http://tvlistings.zap2it.com/tv/sportscenter/EP00019917'>SportsCenter</a>
    12. <ul class="zc-icons">
    13. <li class="zc-ic zc-ic-span"><span class="zc-ic-live">LIVE</span></li></ul>
    14. </li>
    15. <li class="zc-ssl-pg" id="row1-5" style="">
    16.  
    17. <span id="row5Time" class="zc-ssl-pg-time">1:00 PM</span>
    18. <a id="rowTitle5" class="zc-ssl-pg-title" href='http://tvlistings.zap2it.com/tv/sportscenter/EP00019917'>SportsCenter</a>
    19. <ul class="zc-icons">
    20. <li class="zc-ic zc-ic-span"><span class="zc-ic-live">LIVE</span></li></ul>
    21.  
    22.  
    here is the php source:

    PHP:
    1.  
    2.  
    3. <?php
    4.  
    5. $contents = file_get_contents('http://tvlistings.zap2it.com/tvlistings/ZCSGrid.do?stnNum=10179');
    6. preg_match('/<a id="rowTitle3" class="zc-ssl-pg-title"[.*]<\/a>/i', $data, $matches);
    7. $rowtitle = $matches[1];
    8. echo $rowtitle."<br>\n";
    9. ?>
    10.  
    And here is the php output:
    PHP:
    1. <br>
    does anyone know how I can scraping the data from that website using with <a id=rowTitle3 to the end of the page?

    any advice would be much appreicated.

    Thanks in advance
    Last edited: Apr 15, 2013
    mark103, Apr 15, 2013 IP
  2. Alex Roxon

    Alex Roxon Active Member

    Messages:
    423
    Likes Received:
    11
    Best Answers:
    7
    Trophy Points:
    80
    #2
    Lol. Just from looking at your code, I can see that:
    1. You're matching your pattern against the wrong string! (you save to $contents and try to match against $data)
    2. You're not escaping special characters in your pattern (=, - etc.)
    3. If you want to grab the string matched against your pattern, you enclose with regular brackets () and not square brackets []
    Without testing, provided the html against which your pattern was created is accurate, the following should work:
    PHP:
    1. <?php
    2.  
    3. $contents = file_get_contents('http://tvlistings.zap2it.com/tvlistings/ZCSGrid.do?stnNum=10179');
    4. preg_match('/<a id\="rowTitle3" class\="zc\-ssl\-pg\-title"(.*)<\/a>/i', $contents, $matches);
    5. $rowtitle = $matches[1];
    6. echo$rowtitle."<br>\n";
    Alex Roxon, Apr 15, 2013 IP
  3. ds28

    ds28 Greenhorn

    Messages:
    34
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    6
    #3
    try using php library snoopy which simulates browser and It automates the task of retrieving web page content and posting forms
    ds28, Apr 16, 2013 IP
  4. ThePHPMaster

    ThePHPMaster Well-Known Member

    Messages:
    595
    Likes Received:
    30
    Best Answers:
    18
    Trophy Points:
    100
    #4
    You do not need to escape the equal sign, and you can get away with not escaping the dash as well in almost all the cases.

    I am assuming you want all the times/status/links for the page, then the following should do it:

    PHP:
    1.  
    2. // Get times/dates
    3. preg_match_all('/>(.*)<\/span>/iU', $data, $matches);
    4. $times = array_chunk($matches[1], 2);
    5.  
    6. // Get links
    7. preg_match_all('/href=.(.*).>/iU', $data, $matches);
    8. foreach ($times as $sub => $play) {
    9.     list($time, $status) = $play;
    10.     echo "Time: $time <br />";
    11.     echo "Status: $status <br />";
    12.     echo "Link: {$matches[1][$sub]} <br /><br />";
    13. }
    14.  
    ThePHPMaster, Apr 16, 2013 IP