extracting links from an array

Discussion in 'PHP' started by scriptreseller, Jan 5, 2009.

  1. #1
    Hi

    I am pulling my hair with this been 4 days on what i started thing it would taker a few hours lol.

    Write i have a text file which has the body of an email in it. The email has links in it i need to get them links out of the email.

    So far i have manage to but the email in to an array use

    
    $url= explode('http://',$content);
    
    split all so works 
    
    $url= split('http://',$content);
    
    Code (markup):
    Then out put it on to the screen the like this

    
    
    $num = 0;
    foreach($url as $ScreenURL)
    {
        $num++;
        echo "<b>($num)</b> '", ($ScreenURL), "'\n<br /><br />\n\n";
    }
    
    Code (markup):
    which dose give me the urls but it all so gives me a lot of other stuf to i need to now how to just get get the urls.

    I am just learning php and this is killing me

    If any one can help me i would be very great full and so will my hair lol

    Thanks
    Daz
     
    scriptreseller, Jan 5, 2009 IP
  2. cont911

    cont911 Peon

    Messages:
    50
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    0
    #2
    use something line this
    <?php
    $content = "lksdjf lskdjfsdf sdf http://www.php.net/ klsjdf skldfj http://abc.com skdlfjs";
    preg_match_all("/(http:\/\/[^ ]+)/i", $content, $matches);

    var_dump($matches);

    ?>
     
    cont911, Jan 5, 2009 IP
  3. scriptreseller

    scriptreseller Active Member

    Messages:
    323
    Likes Received:
    15
    Best Answers:
    0
    Trophy Points:
    58
    #3
    Thanks that really helped me out i have changed it a bit as it gave me a lump of data which i did not now what to do with lol

    here is how it looks now

    preg_match_all("/<a[\s]+[^>]*href\s*=\s*[\"\']?([^\'\" >]+)[\'\" >]/i", $content, $matches);
     
    
    
    echo '<pre>'; // This is for correct handling of newlines
    ob_start();
    var_dump($matches);
    $a=ob_get_contents();
    ob_end_clean();
    echo htmlspecialchars($a); // Escape every HTML special chars (especially > and < )
    echo '</pre>';
    Code (markup):
    which gives me

    array(2) {
      [0]=>
      array(4) {
        [0]=>
        string(40) "<a href="mailto:XXXXXXX@googlemail.com""
        [1]=>
        string(36) "<a href="mailto:XXXXXXX@XXXXXXX.co.uk""
        [2]=>
        string(35) "<a href="http://www.XXXXXXX.com""
        [3]=>
        string(31) "<a href="http://www.XXXXXXX.eu""
      }
      [1]=>
      array(4) {
        [0]=>
        string(30) "mailto:XXXXXXX@googlemail.com"
        [1]=>
        string(26) "mailto:XXXXXXX@XXXXX.co.uk"
        [2]=>
        string(25) "http://www.XXXXXXX.com"
        [3]=>
        string(21) "http://www.XXXXXXX.eu"
      }
    
    Code (markup):

    how do i get just the urls out of that now so i put them in a database or a text file

    Thanks
     
    scriptreseller, Jan 5, 2009 IP
  4. scriptreseller

    scriptreseller Active Member

    Messages:
    323
    Likes Received:
    15
    Best Answers:
    0
    Trophy Points:
    58
    #4
    Ok this is proberly not the best way to do it and there is proberly a better way to do it lol feel free to let me know if there is but i have got it so i only have the url i did it by duplicating the code but adding a new regex

    
    preg_match_all("/<a[\s]+[^>]*href\s*=\s*[\"\']?([^\'\" >]+)[\'\" >]/i", $html, $matches);
     
    
    echo '<pre>'; // This is for correct handling of newlines
    ob_start();
    var_dump($matches);
    $a=ob_get_contents();
    ob_end_clean();
    preg_match_all("/(http:\/\/[^ ]+)/i", $a, $matches1);
    echo htmlspecialchars($a); // Escape every HTML special chars (especially > and < )
    echo '</pre>';
    
    
    
    echo '<pre>'; // This is for correct handling of newlines
    ob_start();
    var_dump($matches1);
    $a1=ob_get_contents();
    ob_end_clean();
    echo htmlspecialchars($a1);// Escape every HTML special chars (especially > and < )
    echo '</pre>'
    Code (markup):

    which now gives me

    array(2) {
      [0]=>
      array(4) {
        [0]=>
        string(28) "http://www.url1.com""
    "
        [1]=>
        string(24) "http://www.url2.eu""
    "
        [2]=>
        string(27) "http://www.url1.com"
    "
        [3]=>
        string(23) "http://www.url2.eu"
    "
      }
      [1]=>
      array(4) {
        [0]=>
        string(28) "http://www.url1.com""
    "
        [1]=>
        string(24) "http://www.url2.eu""
    "
        [2]=>
        string(27) "http://www.url1.com"
    "
        [3]=>
        string(23) "http://www.url2.eu"
    "
      }
    }
    
    Code (markup):

    How do is get the url to put it in a database with out the string(23) stuff just the url lol

    Thanks
     
    scriptreseller, Jan 5, 2009 IP
  5. Danltn

    Danltn Well-Known Member

    Messages:
    679
    Likes Received:
    36
    Best Answers:
    0
    Trophy Points:
    120
    #5
    Just loop through the array instead of var_dump'ing it...

    for($i = 0, $c = count($matches[1]); $i < $c; ++$i)
        echo $matches[1][$i] . PHP_EOL;
    PHP:
    Could use while/foreach, whatever you prefer.
     
    Danltn, Jan 5, 2009 IP
  6. scriptreseller

    scriptreseller Active Member

    Messages:
    323
    Likes Received:
    15
    Best Answers:
    0
    Trophy Points:
    58
    #6
    Thaks that works well the only problem now is when i try to write it to a to a text file its only writing the last url

    How can i get it to write them all

    Sorry about this just all new to me lol

    Thansk
     
    scriptreseller, Jan 5, 2009 IP
  7. Danltn

    Danltn Well-Known Member

    Messages:
    679
    Likes Received:
    36
    Best Answers:
    0
    Trophy Points:
    120
    #7
    file_put_contents('file.txt', implode(PHP_EOL, $matches[1]));
    PHP:
     
    Danltn, Jan 5, 2009 IP
    scriptreseller likes this.