1. Advertising
    y u no do it?

    Advertising (learn more)

    Advertise virtually anything here, with CPM banner ads, CPM email ads and CPC contextual links. You can target relevant areas of the site and show ads based on geographical location of the user if you wish.

    Starts at just $1 per CPM or $0.10 per CPC.

how can i extract all text in html page between the <body> </body> tags ?

Discussion in 'PHP' started by ramysarwat, Nov 5, 2009.

  1. #1
    how can i extract all text in html page between the <body> </body> tags ?
     
    ramysarwat, Nov 5, 2009 IP
  2. nico_swd

    nico_swd Prominent Member

    Messages:
    4,153
    Likes Received:
    344
    Best Answers:
    18
    Trophy Points:
    375
    #2
    
    if (preg_match('~<body[^>]*>(.*?)</body>~si', $text, $body))
    {
        echo $body[1];
    }
    
    PHP:
     
    nico_swd, Nov 5, 2009 IP
  3. ramysarwat

    ramysarwat Peon

    Messages:
    164
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #3
    thank you nico_swd i try this code but never give any output. any idea why ?

    <?php
    $text = file_get_contents("http://www.google.com/");
    if (preg_match('~<body[^>]*>(.*?)</body>~si', $text, $body)){
    echo $body[1];
    }

    ?>
     
    ramysarwat, Nov 5, 2009 IP
  4. nico_swd

    nico_swd Prominent Member

    Messages:
    4,153
    Likes Received:
    344
    Best Answers:
    18
    Trophy Points:
    375
    #4
    Because Google will redirect you, and file_get_contents() doesn't follow redirects. Try another domain and it'll work.
     
    nico_swd, Nov 5, 2009 IP
  5. ramysarwat

    ramysarwat Peon

    Messages:
    164
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #5
    i try it on 3 other web sites with contents but noting hapen too. any other ideas ?
     
    ramysarwat, Nov 5, 2009 IP
  6. nico_swd

    nico_swd Prominent Member

    Messages:
    4,153
    Likes Received:
    344
    Best Answers:
    18
    Trophy Points:
    375
    #6
    
    $ch = curl_init('http://nicoswd.com/');
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    
    $text = curl_exec();
    
    if (preg_match('~<body[^>]*>(.*?)</body>~si', $text, $body))
    {
        echo $body[1];
    }
    
    PHP:
     
    nico_swd, Nov 5, 2009 IP
  7. ramysarwat

    ramysarwat Peon

    Messages:
    164
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #7
    i can't belive it the same resault with curl too

    when i read the output of curl or file get contents i get the out put but when i use preg_match i get nothing
     
    ramysarwat, Nov 5, 2009 IP
  8. nico_swd

    nico_swd Prominent Member

    Messages:
    4,153
    Likes Received:
    344
    Best Answers:
    18
    Trophy Points:
    375
    #8
    Which domains have you tried?
     
    nico_swd, Nov 5, 2009 IP
  9. ramysarwat

    ramysarwat Peon

    Messages:
    164
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #9
    ramysarwat, Nov 5, 2009 IP
  10. mony911

    mony911 Peon

    Messages:
    114
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #10
    try this.. this will work...


    this is written by Bony Yousuf.. original post is here..

    http://www.sitepoint.com/forums/showthread.php?t=643722
     
    mony911, Nov 5, 2009 IP
  11. unigogo

    unigogo Peon

    Messages:
    286
    Likes Received:
    8
    Best Answers:
    0
    Trophy Points:
    0
    #11
    remove carriage returns
    $str = preg_replace("/\r/", $html, "\s");

    retrieve html between body tags
    preg_match("/<\s*body.*>.*/", $str, $body);

    $result = preg_split("/<(.|\n)*?>/", $body);

    I tried steps here,
    http://www.pagecolumn.com/tool/pregtest.htm
     
    Last edited: Nov 5, 2009
    unigogo, Nov 5, 2009 IP
  12. Izonedig

    Izonedig Member

    Messages:
    150
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    28
    #12
    Izonedig, Feb 17, 2010 IP
  13. danx10

    danx10 Peon

    Messages:
    1,179
    Likes Received:
    44
    Best Answers:
    2
    Trophy Points:
    0
    #13
    Make sure the actual site has a body tag.

    <?php
    
    $site = file_get_contents("http://en.wikipedia.org/wiki/Benchmark");
    
    preg_match("/<body[^>]*>(.*?)<\/body>/is", $site, $matches);
    
    highlight_string($matches[1]);
    
    ?>
    PHP:
    Another example....

    <?php
    
    $site = file_get_contents("http://www.google.com/codesearch");
    
    preg_match("/<body[^>]*>(.*?)<\/body>/is", $site, $matches);
    
    highlight_string($matches[1]);
    
    ?>
    PHP:
     
    danx10, Feb 17, 2010 IP