I'm trying to write a little script that will run through an HTML page and identify every e-mail address on the page (even if it is in a mailto: ).. I want to store these all in an array so that I can echo them back one after another. Here's what I've tried so far--I'm not sure if I'm approaching this correctly and I can't seem to get it to work (I'm also totally new to regex so I could have probs there): $url = "http://www.URL.com"; if(!($contents = file_get_contents($url))) { echo 'could not open url'; exit; } $contents = htmlentities($contents); $p = "/[\._a-zA-Z0-9-]+@[\._a-zA-Z0-9-]+/i"; preg_match_all($p, $contents, $out); echo "<h1>Email: ".$out[1]."</h1>"; echo "<hr>$contents"; Code (markup): If the HTML code of www.URL.com had something like: blah blaeh email@email.com <table><tr><td>bleh <a href="mailto:email2@email.com">me</a></td></tr></table> and this email too: email3@email.com. Code (markup): I would want my script to get all 3 emails and echo them back to me. Any help is much appreciated! To preempt the suspicions, no I'm not using this for anything spammy, it's to identify e-mails on a bunch of old html pages before we had a db set up correctly.
This was meant to be an edit but for some reason it made a new post.. sorry about that I got it working and just wanted to share the code: $url = "http://www.url.com"; if(!($contents = file_get_contents($url))) { echo 'could not open url'; exit; } $contents = htmlentities($contents); $p = "/[\._a-zA-Z0-9-]+@[\._a-zA-Z0-9-]+/i"; #$p = '/breeder/'; preg_match_all($p, $contents, $out); $n = "0" $nout = $out[0]; foreach ($nout as $t) { echo $nout[0]; $n++; } Code (markup):
or <? $url = "http://pozter.info"; $contents = @file_get_contents($url); @preg_match_all("/mailto:(.*)\"/", $contents, $out); echo "<pre>"; print_r($out[1]); ?> PHP: PS, the foreach at the bottom of your code isn't doing anything
thats the part that echos email addresses also, that code would only match emails that had mailto code surround them--i want all addresses, even those that are not clickable.
$nout = $out[0]; foreach ($nout as $t) { echo $nout[0]; $n++; } your echoing the same value everytime you loop, the $n var isn't used, and neither is $t, so the foreach loop isn't doing anything. every e-mail address on the page (even if it is in a mailto: ). didn't see the word even, sorry....