i want to build php crawler to extract links from website ,i wrote code for link: https://www.tradebit.com/filesharing.php/1010-Documents-eBooks-Audio-Books-Teaching i receive links like: https://www.tradebit.com/filedetail.php/276643585-the-ultimate-plr-firesale-oto i want to create for all links from https://www.tradebit.com ,how to change this code: <?php // parser of website tradebit $i=1; $website="https://www.tradebit.com/filesharing.php/1010-Documents-eBooks-Audio-Books-Teaching"; $filename="w.txt"; while ($website){ //echo $website ; $content=file_get_contents($website); $stripped_file = strip_tags($content, "<a>"); //echo $stripped_file."<br>"; //preg_match_all("/<a href=\"([^\"]*)\">(.*)<\/a>/iU",$content,$result); //print_r($result); //foreach ($result[1] as $line ){ //echo $line . "<br />"; //} preg_match_all("/<a[\s]+[^>]*?href[\s]?=[\s\"\']+"."(.*?)[\"\']+.*?>"."([^<]+|.*?)?<\/a>/", $stripped_file, $matches, PREG_SET_ORDER ); foreach($matches as $match){ $href = $match[1]; $pos =strpos ($href,"filedetail"); if ($pos!=0) { echo $href . "<br>"; } } $website=$website."/".$i++; sleep(5); } ?>
Not sure exactly what you are looking to do but the easiest way to scrape links is as follows: <?php $html = <<<EOF this is a test <a title="search" href="http://www.google.com">Google</a> this is a test this is a test EOF; $dom = new DOMDocument(); $dom->loadhtml($html); $links = $dom->getElementsByTagName("a"); foreach ($links as $link) { print $link->getAttribute("href"). "\n"; } ?> PHP:
I would go with the example Netstar did but here is what you would need from your code, although your preg_match matches a bit more than it should perhaps $website="https://www.tradebit.com/filesharing.php/1010-Documents-eBooks-Audio-Books-Teaching"; $content=file_get_contents($website); preg_match_all("/<a[\s]+[^>]*?href[\s]?=[\s\"\']+"."(.*?)[\"\']+.*?>"."([^<]+|.*?)?<\/a>/", $content, $matches, PREG_SET_ORDER ); foreach($matches as $match){ $href = $match[1]; if (strpos ($href,"filedetail")!==0) { echo $href . "<br>"; } } PHP: