Page scraping question

Discussion in 'PHP' started by Egnited, Jun 11, 2008.

  1. #1
    I am working on a site that requires me to pull a small bit of information from another page of my site, and have figured it out somewhat. Here's my code so far. I set up a simple page as an example of the page I am pulling info from.

    <?php 
    $data = file_get_contents('http://www.egnited.net/site.php/');
    $regex = '/Star Wars (.+?) A New Hope/';
    preg_match($regex,$data,$match);
    echo $match[1];
    ?>
    Code (markup):
    This code pulls the text between "Star Wars" and "A New Hope" (the text being "Episode IV:") to the page which I have the above code:

    http://egnited.net/test.php

    My question is, how would I need to alter my code to include:

    1. The entire line, "Star Wars Episode IV: A New Hope";
    and
    2. Multiple lines (ie, from "Star Wars" down to "Jaws"?


    Thanks for any help, I'm sorry if this doesn't make sense.
     
    Egnited, Jun 11, 2008 IP
  2. Barti1987

    Barti1987 Well-Known Member

    Messages:
    2,703
    Likes Received:
    115
    Best Answers:
    0
    Trophy Points:
    185
    #2
    First Question:

    $regex = '/SOMETHINGBEFORE (.+) SOMETHINGAFTER/';

    Usually the something before/after are templated (html code and such).

    Question two:

    Multiple Lines:

    preg_match_all('/STRINGTOMATCH/iUs',$content,$result);

    The Uis are called modifiers, you can look them up at php.net

    Peace,
     
    Barti1987, Jun 11, 2008 IP
    Egnited likes this.
  3. Egnited

    Egnited Well-Known Member

    Messages:
    792
    Likes Received:
    5
    Best Answers:
    0
    Trophy Points:
    110
    #3
    Thanks for the response!

    Could you or someone else tell me what exactly I should replace my current code with to get the above-said results?
     
    Egnited, Jun 11, 2008 IP
  4. Barti1987

    Barti1987 Well-Known Member

    Messages:
    2,703
    Likes Received:
    115
    Best Answers:
    0
    Trophy Points:
    185
    #4
    
    <?php
    $data = file_get_contents('http://www.egnited.net/site.php/');
    preg_match_all('/<br \/>(.+)/m',$data,$results);
    print_r($results[1]);
    ?>
    
    PHP:
    Tested.

    Peace,
     
    Barti1987, Jun 11, 2008 IP
  5. Egnited

    Egnited Well-Known Member

    Messages:
    792
    Likes Received:
    5
    Best Answers:
    0
    Trophy Points:
    110
    #5
    Thanks azizny.

    Here's anothing example of what I'm trying to do..... grab multiple lines.

    I'm trying to scrape info from this webpage:
    http://mobile.weather.gov/port_mp_ns.php?select=3&CityName=Wichita&site=ICT&State=KS&warnzone=KSZ083

    I just want to pull the following lines:

    Last Update: 06/12/08, 09:53 AM CDT
    Weather: Mostly Cloudy
    Temperature: 79°F (26°C)
    Humidity: 67 %
    Wind Speed: S 9 MPH


    What should I change the code to?


    
    <?php
    $data = file_get_contents('http://mobile.weather.gov/port_mp_ns.php?select=3&CityName=Wichita&site=ICT&State=KS&warnzone=KSZ083');
    preg_match_all('/<br \/>(.+)/m',$data,$results);
    print_r($results[1]);
    ?>
    Code (markup):
    Any help would be GREATLY appreciated.... I've spent about ten hours now trying to figure it out.

    Thanks,
    Tom
     
    Egnited, Jun 12, 2008 IP
    Barti1987 likes this.
  6. Barti1987

    Barti1987 Well-Known Member

    Messages:
    2,703
    Likes Received:
    115
    Best Answers:
    0
    Trophy Points:
    185
    #6
    This should do:

    
    $data = file_get_contents('http://mobile.weather.gov/port_mp_ns.php?select=3&CityName=Wichita&site=ICT&State=KS&warnzone=KSZ083');
    preg_match_all("/Last Update:(.*)<br>.*Weather:(.*)<br>.*Temperature:(.*)<br>.*Humidity:(.*)<br>.*Wind Speed:(.*)<br>/Usm",$data,$results);
    print_r($results[1]);
    
    PHP:
    Peace,
     
    Barti1987, Jun 12, 2008 IP