how can i collect all linke from a web page ? i try some codes but it gives me just the last part of the url not the full url so if the url is http://www.example.com/test.html it returns test.html
i want to collect all urls from a any web page for example when i type http://www.google.com i get all links from this page but in full url because i try this in other sites before using file_get_con tents and i get only the pages file name like example.html but i need the full url
I think you can use file_get_contents, then use preg_match to find all links that fit the pattern "<a href=". If you want to append the domain, just use the known domain (or use another preg_match). If you give us some code, that might help too.
Here is a code i found that could possibly aid you in your troubles. function storeLink($url,$gathered_from) { $query = "INSERT INTO links (url, gathered_from) VALUES ('$url', '$gathered_from')"; mysql_query($query) or die('Error, insert query failed'); } $target_url = "http://www.merchantos.com/"; $userAgent = 'Googlebot/2.1 (http://www.googlebot.com/bot.html)'; // make the cURL request to $target_url $ch = curl_init(); curl_setopt($ch, CURLOPT_USERAGENT, $userAgent); curl_setopt($ch, CURLOPT_URL,$target_url); curl_setopt($ch, CURLOPT_FAILONERROR, true); curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); curl_setopt($ch, CURLOPT_AUTOREFERER, true); curl_setopt($ch, CURLOPT_RETURNTRANSFER,true); curl_setopt($ch, CURLOPT_TIMEOUT, 10); $html= curl_exec($ch); if (!$html) { echo "<br />cURL error number:" .curl_errno($ch); echo "<br />cURL error:" . curl_error($ch); exit; } // parse the html into a DOMDocument $dom = new DOMDocument(); @$dom->loadHTML($html); // grab all the on the page $xpath = new DOMXPath($dom); $hrefs = $xpath->evaluate("/html/body//a"); for ($i = 0; $i < $hrefs->length; $i++) { $href = $hrefs->item($i); $url = $href->getAttribute('href'); storeLink($url,$target_url); echo "<br />Link stored: $url"; } PHP: Source: http://www.merchantos.com/makebeta/php/scraping-links-with-php/
Take a look on the article I wrote a couple of hours ago... http://www.jaygilford.com/php/common-questions/how-to-get-all-links-from-a-web-page/