extract links from text file using php

Discussion in 'PHP' started by linkinpark2014, Jul 15, 2008.

  1. #1
    guys how just want to know how do i capture the data on a text file and save it into an array with some filtering

    here is an example:
    I have text file contains links in this form:

    <a href="http://blablabal.com">blah</a>
    <a href="http://blablabal1.com">blah1</a>
    <a href="http://blablabal2.com">blah2</a>
    <a href="http://blablabal3.com">blah3</a>
    Code (markup):
    I want to read only the links without "<a href=>blah</a>"
    and save them inside an array...any ideas?
     
    linkinpark2014, Jul 15, 2008 IP
  2. Danltn

    Danltn Well-Known Member

    Messages:
    679
    Likes Received:
    36
    Best Answers:
    0
    Trophy Points:
    120
    #2
    A regex along the lines of...

    /<a href="(?<url>.+)?">(?<name>.+)<\/a>/

    Think that's about right, definitely the general gist for you.

    Dan
     
    Danltn, Jul 15, 2008 IP
  3. myhart

    myhart Peon

    Messages:
    228
    Likes Received:
    11
    Best Answers:
    0
    Trophy Points:
    0
    #3
    myhart, Jul 15, 2008 IP
  4. linkinpark2014

    linkinpark2014 Peon

    Messages:
    153
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #4
    hi guyz.. both ways give me same results..
    I only want links without <a href></a> tags:

    <a href="http://blablabal.com">blah</a>
    Code (markup):
    I want to get rid of <a href=" and ">blah</a> tags and get only the link
    "http://blablabal.com"..
    Code (markup):
    I tried every single possible way and till now I didnt get any good result...

    I tried to extract links in this way

    if(preg_match_all('/<a\s+.*?href=[\"\']?([^\"\' >]*)[\"\']?[^>]*>(Play)<\/a>/i',$result,$out, PREG_SET_ORDER))
    
    Code (markup):
    I get only 3 arrays which are:

    1-<a href="http://blablabal.com">blah</a>
    2-http://blablabal.com
    3-blah
    Code (markup):
    now Im getting the
    http://blablabla.com
    Code (markup):
    and it looks good for now..which in array no2
    the problem it gives me only the result for 1 link
    I wanna get results for all links inside that text file...any ideas?
     
    linkinpark2014, Jul 16, 2008 IP
  5. sastro

    sastro Well-Known Member

    Messages:
    214
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    105
    #5
    <?
    $file = file_get_contents('http://localhost/tes/x.html');
    preg_match_all('/<a href=\"(.*)\"\s/',$file,$a);

    $count = count($a[1]);
    echo "<b>Number of Urls</b> = " .$count."<p>";
    for ($row = 0; $row < $count ; $row++) {
    echo $a[1]["$row"]."<br>";
    }

    ?>
     
    sastro, Jul 16, 2008 IP
  6. linkinpark2014

    linkinpark2014 Peon

    Messages:
    153
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #6
    okay thanx all for ur help...
    I just changed regex pattern and everything works great!

    (preg_match_all('/<a\s+.*?href=[\"\']?([^\"\' >]*)[\"\']?[^>]*>(Play)<\/a>/i',$file,$a))


    :)
     
    linkinpark2014, Jul 16, 2008 IP