<?php function only_links($var) { echo $var; $str=$var ; $str=preg_match("/href='([^']*)'/", $str, $regs); $new_str= $regs[1]; $var=substr($new_str,7,strlen($new_str)); echo "$var"; return($var); } $fh=file("index.html"); array_filter($fh,"only_links"); print_r($fh); ?> Index.html is a webpage which contains all kind of data including the page links . My main task is to find out all the links from that page . Here with this code i am trying to take them all in an array . Is it a good way to do this task ? please help me with that . Even my this code is not running , i don't know the reason . I will appreciate your help . Thanks for reading this thread . Cheers .
<?php // Coded by Daniel Clarke (Danstuts) // Use Curl to open up the website - with a timeout of 60 to avoid wasting resources function opens($url) { $ch = curl_init(); $useragent="Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.1) Gecko/20061204 Firefox/2.0.0.1"; curl_setopt($ch, CURLOPT_HEADER, 0); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); //Set curl to return the data instead of printing it to the browser. curl_setopt($ch, CURLOPT_URL, $url); curl_setopt ($ch, CURLOPT_REFERER, 'http://www.google.com'); curl_setopt ($ch, CURLOPT_USERAGENT, "Mozilla/6.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6"); curl_setopt ($ch, CURLOPT_TIMEOUT, 60); $data = curl_exec($ch); curl_close($ch); return $data; } // Set the page to be opened. $url = "http://news.bbc.co.uk/1/hi/world/asia-pacific/8336564.stm"; // use the curl function to grab the page contents. $webpage = opens($url); // Use some regex to grab all the web URLs from the page we've opened. preg_match_all ("/a[\s]+[^>]*?href[\s]?=[\s\"\']+(.*?)[\"\']+.*?>([^<]+|.*?)?<\/a>/", $webpage, &$urlmatch); // urls now contains all the URL's that have been matched. It's currently in array $urls = $urlmatch[1]; // For testing echo each item of the array (the urls in this case) foreach($urls as $var) { echo($var."<br>"); } ?> PHP:
Thank you so much sir !! I am just going to play with it . I ill let you know if i need further help . Thanks again , You are best !! Cheers.
Call to undefined function curl_init() . What to do ? how should i use your this function in my code ? Thanks
Your host does not have Curl installed. Try this instead: <?php // Coded by Daniel Clarke (Danstuts) // Use Curl to open up the website - with a timeout of 60 to avoid wasting resources function opens($url) { return file_get_contents($url); } // Set the page to be opened. $url = "http://news.bbc.co.uk/1/hi/world/asia-pacific/8336564.stm"; // use the curl function to grab the page contents. $webpage = opens($url); // Use some regex to grab all the web URLs from the page we've opened. preg_match_all ("/a[\s]+[^>]*?href[\s]?=[\s\"\']+(.*?)[\"\']+.*?>([^<]+|.*?)?<\/a>/", $webpage, &$urlmatch); // urls now contains all the URL's that have been matched. It's currently in array $urls = $urlmatch[1]; // For testing echo each item of the array (the urls in this case) foreach($urls as $var) { echo($var."<br>"); } ?> PHP:
Hello Sir , It worked . But i have one more question . I want only html pages and i want to clear all other files like .jpg .pdf . doc . What should i do ? basically i want to filter it and want only html pages . Thanks Ved