Extract Data from Web Page

Discussion in 'PHP' started by kolucoms6, Sep 5, 2010.

  1. #1
    What I am looking for a PHP applications which will :

    Extract and Display Data , based on 2 search criterias i.e Last Name and County { I may create Table of few last name and Counties in Mysql } , from "Result HTML" based on HTML Tag ( Name, Address and Phone ) in a Tabular Format.

    Repeat the above logic 4 times for Each Page { Page 1, 2 , 3 ,4 } of Resulted HTML.

    Basically , I want to Pull out Data from Different White Pages and Display them in a Tabular Format.

    Possible ??
     
    kolucoms6, Sep 5, 2010 IP
  2. Rainulf

    Rainulf Active Member

    Messages:
    373
    Likes Received:
    12
    Best Answers:
    0
    Trophy Points:
    85
    #2
    Yes, it's possible. :)

    If you're grabbing data from mysql database, it should be something like this:
    
    $sql = new mysqli('localhost', 'user', 'pass') or die($sql->error);
    $sql->select_db('db') or die ($sql->error);
    $result = $sql->query("SELECT * FROM tablename LIMIT 0, to watever");
    
    // draw your tabular thingy
    echo "<table>";
    while($row = $result->fetch_array( )) {
       echo "<tr><td>{$row['watever']}</td></tr>";
    }
    echo "</table>";
    
    $sql->close( );
    
    PHP:
     
    Rainulf, Sep 5, 2010 IP
  3. kolucoms6

    kolucoms6 Active Member

    Messages:
    1,198
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    75
    #3
    Here is what I am looking for :

    I have a Link which displays some data on HTML format :

    http://www.118.com/people-search.mv...john&Location=london&pageSize=50&pageNumber=1

    Data comes in below format :

    <div class="searchResult regular">
    <h2>Bird John</h2>
    <div class="address">
    56 Leathwaite Road<br />
    London<br />
    SW11 6RS
    </div>
    <div class="telephoneNumber">
    020 7228 5576
    </div>
    </div>

    I want my PHP page to execute above URL and Extract/Parse Data from the Result HTML page based on above Tags as
    h2=Name
    address=Address
    telephoneNumber= Phone Number

    and Display them in a Tabular Format.
     
    kolucoms6, Sep 5, 2010 IP
  4. kolucoms6

    kolucoms6 Active Member

    Messages:
    1,198
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    75
    #4
    I got this but it only shows the TEXT format of an HTML page but works to an extent:

     
    kolucoms6, Sep 6, 2010 IP
  5. MyVodaFone

    MyVodaFone Well-Known Member

    Messages:
    1,048
    Likes Received:
    42
    Best Answers:
    10
    Trophy Points:
    195
    #5
    EDIT: See below...
     
    Last edited: Sep 6, 2010
    MyVodaFone, Sep 6, 2010 IP
  6. kolucoms6

    kolucoms6 Active Member

    Messages:
    1,198
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    75
    #6
    Thanks a lot
     
    kolucoms6, Sep 6, 2010 IP
  7. kolucoms6

    kolucoms6 Active Member

    Messages:
    1,198
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    75
    #7
    Tried this to display data in tabular format

    echo '<table border=1><tr><td>'.$name.'</td><td>'.$address.'</td><td>'.$telephoneNumber.'</td></tr></table>';

    but

    format goes heywire.
     
    kolucoms6, Sep 6, 2010 IP
  8. MyVodaFone

    MyVodaFone Well-Known Member

    Messages:
    1,048
    Likes Received:
    42
    Best Answers:
    10
    Trophy Points:
    195
    #8
    Sorry my bad, there's a problem with my code, I'll post back here when I have it, but if anyone else would like to join in, please do... you all know I'm crap at this regex stuff:)
     
    MyVodaFone, Sep 6, 2010 IP
  9. kolucoms6

    kolucoms6 Active Member

    Messages:
    1,198
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    75
    #9
    What I am looking for is :

    Display Option to select Last name and County from the DropDown.

    User can select Atmost 25 Lastnames at a time from each County.

    Once click on Submit, It will execute the Loop of Last name in that particular County and Displays the Result in a Tabular format.

    Is it possible ?
     
    kolucoms6, Sep 6, 2010 IP
  10. MyVodaFone

    MyVodaFone Well-Known Member

    Messages:
    1,048
    Likes Received:
    42
    Best Answers:
    10
    Trophy Points:
    195
    #10
    Ok try this instead and just replace the echo with your table. Note the changes below.

    
    <?php
    
    $url = get_data("http://www.118.com/people-search.mvc?Supplied=true&Name=john&Location=london&pageSize=50&pageNumber=1");
    
    function get_data($url)
    {
    $ch = curl_init();
    	curl_setopt($ch, CURLOPT_HEADER, 0);
    	curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); 
    	curl_setopt($ch, CURLOPT_URL, $url);
    	curl_setopt ($ch, CURLOPT_REFERER, 'http://www.mse360.com/about/bot.php');
     	curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']);
    	curl_setopt ($ch, CURLOPT_TIMEOUT, 60);
    	$data = curl_exec($ch);
    	curl_close($ch);
    	return $data;
    }
    
    $string = preg_match_all('#<h2>([^"]+)</h2>.+?<div class="address">([^"]+)</div>.+?<div class="telephoneNumber">([^"]+)</div>#is', $url, $matches, PREG_SET_ORDER);
    foreach ($matches as $item) {
    echo '<div class="searchResult regular">
                            <h2>'.$item[1].'</h2>
                            <div class="address">
                            '.$item[2].'
                            </div>
                            <div class="telephoneNumber">
                             '.$item[3].'
                            </div>
                        </div>
    ';
    }
    ?>
    
    PHP:
    With regards to your search criteria, I guess that will be part of your search url

    PS: the address out put, contains <br tags to remove those use strip_tags($item[2])
     
    MyVodaFone, Sep 6, 2010 IP
  11. kolucoms6

    kolucoms6 Active Member

    Messages:
    1,198
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    75
    #11
    Trying this :

    $arr = array(1, 2, 3, 4);
    foreach ($arr as &$value) {
    $url = get_data("http://www.118.com/people-search.mvc?Supplied=true&Name=john&Location=london&pageSize=50&pageNumber=" . $arr);

    But looks like there is an Error as page displays nothing !!!


    Also, echo '<table border=1><tr><td>'.$item[1].'</td><td>'.$item[2].'</td><td>'.$item[3].'</td></tr></table>';
     
    kolucoms6, Sep 6, 2010 IP
  12. MyVodaFone

    MyVodaFone Well-Known Member

    Messages:
    1,048
    Likes Received:
    42
    Best Answers:
    10
    Trophy Points:
    195
    #12

    Whats wrong with what I gave you...
    
    <?php
    
    $url = get_data("http://www.118.com/people-search.mvc?Supplied=true&Name=john&Location=london&pageSize=50&pageNumber=1");
    
    function get_data($url)
    {
    $ch = curl_init();
    	curl_setopt($ch, CURLOPT_HEADER, 0);
    	curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); //Set curl to return the data instead of printing it to the browser.
    	curl_setopt($ch, CURLOPT_URL, $url);
    	curl_setopt ($ch, CURLOPT_REFERER, 'http://www.mse360.com/about/bot.php');
     	curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']);
    	curl_setopt ($ch, CURLOPT_TIMEOUT, 60);
    	$data = curl_exec($ch);
    	curl_close($ch);
    	return $data;
    }
    
    $string = preg_match_all('#<h2>([^"]+)</h2>.+?<div class="address">([^"]+)</div>.+?<div class="telephoneNumber">([^"]+)</div>#is', $url, $matches, PREG_SET_ORDER);
    foreach ($matches as $item) {
    echo '<table border=1><tr><td>'.$item[1].'</td><td>'.$item[2].'</td><td>'.$item[3].'</td></tr></table>
    ';
    }
    ?>
    
    PHP:
    Your table just needs fixing up.
     
    MyVodaFone, Sep 6, 2010 IP
  13. kolucoms6

    kolucoms6 Active Member

    Messages:
    1,198
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    75
    #13
    Table thing worked perfectly.

    Now, above url is for Page 1, I want to loop through other 3 pages also. I.e 1 - 4 pages.
     
    kolucoms6, Sep 6, 2010 IP
  14. MyVodaFone

    MyVodaFone Well-Known Member

    Messages:
    1,048
    Likes Received:
    42
    Best Answers:
    10
    Trophy Points:
    195
    #14
    You can create your own search form using the following variables:

    
    $name= $_POST['name'];
    $location=$_POST['location'];
    $pageSize=$_POST['pageSize'];
    $pageNumber=$_POST['pageNumber'];
    
    PHP:
    Then you build your $url Name=$name&Location=$location&pageSize=$pageSize&pageNumber=$pageNumber
     
    MyVodaFone, Sep 6, 2010 IP
  15. kolucoms6

    kolucoms6 Active Member

    Messages:
    1,198
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    75
    #15
    kolucoms6, Sep 6, 2010 IP
  16. MyVodaFone

    MyVodaFone Well-Known Member

    Messages:
    1,048
    Likes Received:
    42
    Best Answers:
    10
    Trophy Points:
    195
    #16
    As far as I can tell, you can do something like 5,10,15 etc.. to 50, your probable best to set that at a permanent 50 and just use the page numbers
     
    MyVodaFone, Sep 6, 2010 IP
  17. kolucoms6

    kolucoms6 Active Member

    Messages:
    1,198
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    75
    #17
    What the Problem with below code ::


    
    
    <html>
    <body>
    
    <link rel=stylesheet type="text/css" href="CSS/default0.css">
    
    <form action="<?php echo $_SERVER["PHP_SELF"]; ?>" method="post">
    
    Last Name 1 : <input type=text name=name><br>
    
    Location : <input type=text name=location><br><br>
    
    <input type="submit" value="submit" name="submit">
    </form>
    </body>
    </html>
    
    <?php
    
    $name= $_POST['name'];
    $location=$_POST['location'];
    
    ?>
    
    <?php
    
    $url1 = get_data("http://www.118.com/people-search.mvc?Supplied=true&Name=$name&Location=$london&pageSize=50&pageNumber=1");
    $url2 = get_data("http://www.118.com/people-search.mvc?Supplied=true&Name=$name&Location=$london&pageSize=50&pageNumber=2");
    $url3 = get_data("http://www.118.com/people-search.mvc?Supplied=true&Name=$name&Location=$london&pageSize=50&pageNumber=3");
    $url4 = get_data("http://www.118.com/people-search.mvc?Supplied=true&Name=$name&Location=$london&pageSize=50&pageNumber=4");
    
    function get_data($url)
    {
    $ch = curl_init();
        curl_setopt($ch, CURLOPT_HEADER, 0);
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); //Set curl to return the data instead of printing it to the browser.
        curl_setopt($ch, CURLOPT_URL, $url);
        curl_setopt ($ch, CURLOPT_REFERER, 'http://www.mse360.com/about/bot.php');
        curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']);
        curl_setopt ($ch, CURLOPT_TIMEOUT, 60);
        $data = curl_exec($ch);
        curl_close($ch);
        return $data;
    }
    
    $string1 = preg_match_all('#<h2>([^"]+)</h2>.+?<div class="address">([^"]+)</div>.+?<div class="telephoneNumber">([^"]+)</div>#is', $url1, $matches, PREG_SET_ORDER);
    foreach ($matches as $item) {
    
    echo '<table border=1><tr><td>'.$item[1].'</td><td>'.$item[2].'</td><td>'.$item[3].'</td></tr></table>
    ';
    }
    
    $string2= preg_match_all('#<h2>([^"]+)</h2>.+?<div class="address">([^"]+)</div>.+?<div class="telephoneNumber">([^"]+)</div>#is', $url2, $matches, PREG_SET_ORDER);
    
    foreach ($matches as $item) {
    
    echo '<table border=1><tr><td>'.$item[1].'</td><td>'.$item[2].'</td><td>'.$item[3].'</td></tr></table>
    ';
    }
    
    $string3= preg_match_all('#<h2>([^"]+)</h2>.+?<div class="address">([^"]+)</div>.+?<div class="telephoneNumber">([^"]+)</div>#is', $url3, $matches, PREG_SET_ORDER);
    
    foreach ($matches as $item) {
    
    echo '<table border=1><tr><td>'.$item[1].'</td><td>'.$item[2].'</td><td>'.$item[3].'</td></tr></table>
    ';
    
    }
    $string4= preg_match_all('#<h2>([^"]+)</h2>.+?<div class="address">([^"]+)</div>.+?<div class="telephoneNumber">([^"]+)</div>#is', $url4, $matches, PREG_SET_ORDER);
    
    foreach ($matches as $item) {
    
    echo '<table border=1><tr><td>'.$item[1].'</td><td>'.$item[2].'</td><td>'.$item[3].'</td></tr></table>
    ';
    }
    
    ?>
    
    
    Code (markup):

    When I click on submit, it takes me back the normal page instead of the result.
     
    Last edited: Sep 6, 2010
    kolucoms6, Sep 6, 2010 IP
  18. MyVodaFone

    MyVodaFone Well-Known Member

    Messages:
    1,048
    Likes Received:
    42
    Best Answers:
    10
    Trophy Points:
    195
    #18
    
    <html>
    <form action="<?php echo $_SERVER["PHP_SELF"]; ?>" method="post">
    
    Last Name 1 : <input type=text name=name><br>
    
    Location : <input type=text name=location><br><br>
    
    <input type="submit" value="submit" name="submit">
    </form>
    </html>
    
    <?php
    $name= $_POST['name'];
    $location=$_POST['location'];
    $url = "http://www.118.com/people-search.mvc?Supplied=true&Name=$name&Location=$location&pageSize=50&pageNumber=1";
    
    if(isset($url)){
    $url = get_data($url);
    }
    
    function get_data($url)
    {
    $ch = curl_init();
    	curl_setopt($ch, CURLOPT_HEADER, 0);
    	curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); //Set curl to return the data instead of printing it to the browser.
    	curl_setopt($ch, CURLOPT_URL, $url);
    	curl_setopt ($ch, CURLOPT_REFERER, 'http://www.mse360.com/about/bot.php');
     	curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']);
    	curl_setopt ($ch, CURLOPT_TIMEOUT, 60);
    	$data = curl_exec($ch);
    	curl_close($ch);
    	return $data;
    }
    
    $string = preg_match_all('#<h2>([^"]+)</h2>.+?<div class="address">([^"]+)</div>.+?<div class="telephoneNumber">([^"]+)</div>#is', $url, $matches, PREG_SET_ORDER);
    foreach ($matches as $item) {
    echo '<table border=1><tr><td>'.$item[1].'</td><td>'.$item[2].'</td><td>'.$item[3].'</td></tr></table>
    ';
    }
    ?>
    PHP:
    If you need anything more complex, try the programming section.
     
    MyVodaFone, Sep 6, 2010 IP
  19. kolucoms6

    kolucoms6 Active Member

    Messages:
    1,198
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    75
    #19
    Thanks for the Code but where did I make the Mistake so that I shld Learn ?
     
    kolucoms6, Sep 6, 2010 IP
  20. kolucoms6

    kolucoms6 Active Member

    Messages:
    1,198
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    75
    #20
    Address has 3 Lines.

    52 Earls Mill Road
    Plymouth
    PL7 2BX

    How to divide it in 3 Different Column ?
     
    kolucoms6, Sep 6, 2010 IP