Extract txt from url html

Discussion in 'PHP' started by Silver89, Mar 12, 2008.

  1. #1
    Hi,

    I want to extract text from a site between certain tags,

    If the tags are in the foolowing format:

    
    <a class="classname" target="_blank"> This Text</a><span class="classname">
    
    
    HTML:
    How can i use php to loop through and collect "This Text"

    Thanks
     
    Silver89, Mar 12, 2008 IP
  2. nico_swd

    nico_swd Prominent Member

    Messages:
    4,153
    Likes Received:
    344
    Best Answers:
    18
    Trophy Points:
    375
    #2
    
    preg_match_all('~<a class="classname" target="_blank">([^<]+)</a>~i', $text, $matches);
    
    print_r($matches[1]);
    
    PHP:
     
    nico_swd, Mar 12, 2008 IP
  3. Silver89

    Silver89 Notable Member

    Messages:
    2,243
    Likes Received:
    72
    Best Answers:
    0
    Trophy Points:
    205
    #3
    Thanks, however when i use that i get the following ..

    Array ( [0] => Array ( ) [1] => Array ( ) )
     
    Silver89, Mar 12, 2008 IP
  4. nico_swd

    nico_swd Prominent Member

    Messages:
    4,153
    Likes Received:
    344
    Best Answers:
    18
    Trophy Points:
    375
    #4
    I tried my code with this as source text, and it worked for me:
    
    $text = '<a class="classname" target="_blank"> This Text</a><span class="classname">
    
    <a class="classname" target="_blank"> This Text</a><span class="classname">
    
    
    <a class="classname" target="_blank"> This Text</a><span class="classname">';
    
    preg_match_all('~<a class="classname" target="_blank">([^<]+)</a>~i', $text, $matches);
    
    print_r($matches[1]);
    
    PHP:
    Output:
    
    Array
    (
        [0] =>  This Text
        [1] =>  This Text
        [2] =>  This Text
    )
    
    Code (markup):
    Does $text contain the source code?
     
    nico_swd, Mar 12, 2008 IP
  5. Silver89

    Silver89 Notable Member

    Messages:
    2,243
    Likes Received:
    72
    Best Answers:
    0
    Trophy Points:
    205
    #5
    I think so ..

    If i wanted it to start with: class="classname" and not <a class="classname" would this be correct?
    
    
    $html = file_get_contents('url here'); 
    
    preg_match_all('~class="classname" target="_blank">([^<]+)</a>~i', $html, $matches);
    
    print_r($matches[1]);
    
    
    Code (markup):
     
    Silver89, Mar 12, 2008 IP
  6. nico_swd

    nico_swd Prominent Member

    Messages:
    4,153
    Likes Received:
    344
    Best Answers:
    18
    Trophy Points:
    375
    #6
    Yeah... does it work now for you?
     
    nico_swd, Mar 12, 2008 IP
    Silver89 likes this.
  7. Silver89

    Silver89 Notable Member

    Messages:
    2,243
    Likes Received:
    72
    Best Answers:
    0
    Trophy Points:
    205
    #7
    Yes, thanks
     
    Silver89, Mar 12, 2008 IP
  8. Silver89

    Silver89 Notable Member

    Messages:
    2,243
    Likes Received:
    72
    Best Answers:
    0
    Trophy Points:
    205
    #8
    How can i get it echo just the values so instead of:

    Array
    (
    [0] => This Text
    [1] => This Text
    [2] => This Text
    )


    It would output:

    This Text
    This Text
    This Text
     
    Silver89, Mar 12, 2008 IP
  9. nico_swd

    nico_swd Prominent Member

    Messages:
    4,153
    Likes Received:
    344
    Best Answers:
    18
    Trophy Points:
    375
    #9
    
    echo $matches[1][0];
    echo $matches[1][1];
    echo $matches[1][2];
    // and so on.
    
    PHP:
    You can also make a shortcut:
    
    $matches = $matches[1];
    echo $matches[0];
    echo $matches[1];
    // ...
    
    PHP:
    ... and remove the print_r() line.
     
    nico_swd, Mar 12, 2008 IP
  10. Silver89

    Silver89 Notable Member

    Messages:
    2,243
    Likes Received:
    72
    Best Answers:
    0
    Trophy Points:
    205
    #10
    If i don't know the number of all the matches is there a simple way to echo all of them at once?
     
    Silver89, Mar 12, 2008 IP
  11. nico_swd

    nico_swd Prominent Member

    Messages:
    4,153
    Likes Received:
    344
    Best Answers:
    18
    Trophy Points:
    375
    #11
    Well you could loop through them:
    
    foreach ($matches[1] AS $match)
    {
        echo $match, '<br />';
    }
    
    PHP:
    Or:
    
    echo implode(', ', $matches[1]);
    
    PHP:
    Depends on how you want to display them...
     
    nico_swd, Mar 12, 2008 IP