Hey guys, Let's say I have a string with a butt load of HTML in it. What would be the most efficient way to parse through and find all the URL's in it? I currently have a function that will pull all text from between let's say, <a> and </a>, but after it does it once, it stops (what it searches between can literally be anything). Obviously I need a loop of some sort... but I'm kinda brain dead today and not seeing a good way to pull this off. Thanks.
I needed something similar to this for one of my projects. I just created an infinite loop and set a condition which when met breaks the loop. while(1){ // Code which pulls text between<a> and </a> if($offset === false) { break; } } PHP: Remember to store the offset, so that the loop doesn't loop over the same set of <a></a> over and over. i.e. $offset = strpos($text, "</a>", $offset); PHP: Hope this makes sense, I'm quite tired...
Hey, check out the PHP command preg_match_all. You can use it to easily find all matches (with a clever regular expression), like: $strHTML = "<a href=\"asdf\">etc etc</a>"; // This would be your html code you need parsed. preg_match_all('/<a href="(.+?)">(.+?)<\/a>/', $strHTML, $matches, PREG_SET_ORDER); print_r($matches); PHP: ( Untested of course but I hope it helps =] )
Thanks guys. I looked pretty heavily into preg_match_all but regular expressions were killing me. I went through the whole chapter about it in one of my books... it helped a little but there are a lot of nuances I guess I will have to learn over time. I'll try the preg_match_all method first, and then the latter. Thanks!
Ok I stripped this from a piece of code I wrote for an affiliate script. It will return the number of urls as well as print each out one. <? $file = file_get_contents('http://www.yourfile'); preg_match_all('/<a href="(.*)">/', $file, $a); $count = count($a[1]); echo "<b>Number of Urls</b> = " .$count."<p>"; for ($row = 0; $row < $count ; $row++) { echo $a[1]["$row"]."<br>"; } ?> PHP:
This is almost exactly what I needed! Thanks! With just a couple teensy modifications, I should have this all ready to go in no time. Thanks!!! +rep!