file_get_contents and preg_match

Discussion in 'PHP' started by Scorpiono, Jul 25, 2008.

  1. #1
    Any tip why it displays only 1 link instead everyone ?

    $content = file_get_contents("http://www.Scorpiono.com");
    
    preg_match("/<a href=(.*?)>(.*)<\/a>/",$content,$matches);
    echo $matches[0];
    PHP:

     
    Scorpiono, Jul 25, 2008 IP
  2. nfd2005

    nfd2005 Well-Known Member

    Messages:
    295
    Likes Received:
    20
    Best Answers:
    0
    Trophy Points:
    130
    #2
    use: preg_match_all
     
    nfd2005, Jul 25, 2008 IP
    Scorpiono likes this.
  3. Scorpiono

    Scorpiono Well-Known Member

    Messages:
    1,330
    Likes Received:
    35
    Best Answers:
    0
    Trophy Points:
    120
    #3
    The regex seems busted, I need to make it stop after </a> - can anyone see any problem?

    PS: Thanks nfd2005, worked.
     
    Scorpiono, Jul 25, 2008 IP
  4. nfd2005

    nfd2005 Well-Known Member

    Messages:
    295
    Likes Received:
    20
    Best Answers:
    0
    Trophy Points:
    130
    #4
    Are you trying to get the anchor text and the url or just the URL?
     
    nfd2005, Jul 25, 2008 IP
  5. Scorpiono

    Scorpiono Well-Known Member

    Messages:
    1,330
    Likes Received:
    35
    Best Answers:
    0
    Trophy Points:
    120
    #5
    I'm trying to get all the complete HTML <a href="etc">etc</a> that has a specific "etc" in the href.

    I though I can do this by using 2 arrays, but I'm stuck here.. prolly bad code whatsoever.

    Got a solution of yourself ? THanks you, green repped for previous help!

     
    Scorpiono, Jul 25, 2008 IP
  6. Mozzart

    Mozzart Peon

    Messages:
    189
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    0
    #6
    Eep, I'm not in my desktop PC (where the local webserver is right now to test the code) but

    It would be something like
    
    <?php
    
    $content = file_get_contents("http://manele.radioinferno.org");
    
    preg_match_all("/<a href=.*?>.*<\/a>/",$content,$matches);
    preg_match_all("/(<a href=\"(.*)netdrive(.*)<\/a>)/",$matches[0][$i],$ceva);
    echo $ceva[0][0];
    $size = sizeof($matches[0]);
    echo $size;
    //for ($i=1;$i<=$size;$i++) {
    // preg_match_all("/(<a href=\"(.*)netdrive(.*)<\/a>)/",$matches[0][$i],$ceva);
    // echo $ceva[0][$i];
    //}
    /* 
    
    PHP:
    Okay, the issue is that using () which are alternation class is to "encapsulate" *I don't know if this is the right word in terms of programming but you get the idea* the data made by the set of rules you have set inside it

    Instead you will get the href="" separated and the >TEXT HERE</a> *TEXT HERE separated* If i'm correct the urls will appear in a new array and the text will appear in another one. and the rest in another array

    Cheers, sorry if this sounds confusing, I'll give it a shot when I get back to my pc
     
    Mozzart, Jul 25, 2008 IP
  7. Scorpiono

    Scorpiono Well-Known Member

    Messages:
    1,330
    Likes Received:
    35
    Best Answers:
    0
    Trophy Points:
    120
    #7
    Scorpiono, Jul 25, 2008 IP
  8. Scorpiono

    Scorpiono Well-Known Member

    Messages:
    1,330
    Likes Received:
    35
    Best Answers:
    0
    Trophy Points:
    120
    #8
    Fixed, re-coded, works! ;) Thank you
     
    Scorpiono, Jul 25, 2008 IP
  9. Scorpiono

    Scorpiono Well-Known Member

    Messages:
    1,330
    Likes Received:
    35
    Best Answers:
    0
    Trophy Points:
    120
    #9
    Parse error: syntax error, unexpected T_BOOLEAN_AND in D:\public_html\manele.me\crawl.php on line 9

    if (preg_match("/netdrive.ws|dump.ro/i",$matches[2][$i]) > 0) && (!preg_match("/sex/i",$matches[0][$i])) {




    -------
    What am I missing?
     
    Scorpiono, Jul 25, 2008 IP
  10. nico_swd

    nico_swd Prominent Member

    Messages:
    4,153
    Likes Received:
    344
    Best Answers:
    18
    Trophy Points:
    375
    #10
    Remove the parenthesis right in front of the second preg_match().

    EDIT:

    Also, you don't need to check if the returned value is greater than 0. PHP will treats 0 as false and 1 as true. You also need to escape the dots in the domain names with a backslash. Otherwise they mean "any character".
     
    nico_swd, Jul 25, 2008 IP
  11. Scorpiono

    Scorpiono Well-Known Member

    Messages:
    1,330
    Likes Received:
    35
    Best Answers:
    0
    Trophy Points:
    120
    #11
    Oh, yeah true nico, ty
     
    Scorpiono, Jul 26, 2008 IP
  12. Scorpiono

    Scorpiono Well-Known Member

    Messages:
    1,330
    Likes Received:
    35
    Best Answers:
    0
    Trophy Points:
    120
    #12


    Tip please?
     
    Scorpiono, Jul 26, 2008 IP
  13. nico_swd

    nico_swd Prominent Member

    Messages:
    4,153
    Likes Received:
    344
    Best Answers:
    18
    Trophy Points:
    375
    #13
    Check the parenthesis.

    You have to close every parenthesis you open. Try to figure out which parenthesis belongs to which and close them all.
     
    nico_swd, Jul 26, 2008 IP
  14. Scorpiono

    Scorpiono Well-Known Member

    Messages:
    1,330
    Likes Received:
    35
    Best Answers:
    0
    Trophy Points:
    120
    #14
    Honestly can't find them and I'm feeling really dumb right now asking you kindly to bold it?
     
    Scorpiono, Jul 26, 2008 IP
  15. Scorpiono

    Scorpiono Well-Known Member

    Messages:
    1,330
    Likes Received:
    35
    Best Answers:
    0
    Trophy Points:
    120
    #15
    Fixed, ( ) at the front and back of course.

    Ty again
     
    Scorpiono, Jul 26, 2008 IP
  16. Mozzart

    Mozzart Peon

    Messages:
    189
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    0
    #16
    Try

    
    if (preg_match("/netdrive\.ws|dump\.ro/i",$matches[2][$i]) && !preg_match("/sex/i",$matches[0][$i])) { 
    
    PHP:
     
    Mozzart, Jul 26, 2008 IP
  17. Scorpiono

    Scorpiono Well-Known Member

    Messages:
    1,330
    Likes Received:
    35
    Best Answers:
    0
    Trophy Points:
    120
    #17
    $matches[0][$i] = preg_replace("/\n+/s", "", $matches[0][$i]);

    I'm trying to remove all the blank spaces, this regex doesn't seem to work, any tips please?
     
    Scorpiono, Jul 26, 2008 IP