I am looking for a very simple php script that crawls a site, and grabs all of the links. I can't find any good scripts, and I've tried at least 10. If you could help me with this, or even guide me in the right direction (what functions, etc) I would appreciate it. Thanks.
You want simple? Okay, here's a really simple one that I just whipped up for ya. <?php $original_file = file_get_contents("http://www.domain.com"); $stripped_file = strip_tags($original_file, "<a>"); preg_match_all("/<a(?:[^>]*)href=\"([^\"]*)\"(?:[^>]*)>(?:[^<]*)<\/a>/is", $stripped_file, $matches); //DEBUGGING //$matches[0] now contains the complete A tags; ex: <a href="link">text</a> //$matches[1] now contains only the HREFs in the A tags; ex: link header("Content-type: text/plain"); //Set the content type to plain text so the print below is easy to read! print_r($matches); //View the array to see if it worked ?> PHP: You would remove everything after //DEBUGGING when actually using it though. It's just for you to see how it works if you put it in a PHP file by itself for testing. Only took 3 lines.. not counting debugging.
Well, I only made it to rip the links off of one page. You can set it up so those 3 lines are in a function and it returns the $matches[1] array, and then you can loop through that array and call the script again for each link in that array value (which is a link itself) so that it keeps crawling. If you'd like to see an example of what those 3 lines do, go to php.sitexero.net/?preview=link_crawler
Yeah.. I did the hard part. Looping through them should be a breeze. Anyways, I'm still pretty bored right now and don't want to start on any of my major projects, so I guess I can quickly edit the example to dig X amount of times. EDIT: Nevermind.. I was over halfway done when I decided that doing that was very unstable. It would take forever to load. I think my example on how to dig one page is enough. I made a few modifications to it, though. Check php.sitexero.net/?code=link_crawler
I think "PHP Crawler" would be useful . Its simple php based web crawler sourceforge.net/projects/php-crawler/
hello here is a simple example $ch = curl_init(); curl_setopt($ch, CURLOPT_URL,"http://www.urlyourstart.com"); curl_setopt($ch, CURLOPT_TIMEOUT, 30); //timeout after 30 seconds curl_setopt($ch, CURLOPT_RETURNTRANSFER,1); $result=curl_exec ($ch); curl_close ($ch); // Search The Results From The Starting Site if( $result ) { // I LOOK ONLY FROM TOP domains change this for your usage preg_match_all( '/<a href="(http:\/\/www.[^0-9].+?)"/', $result, $output, PREG_SET_ORDER ); foreach( $output as $item ) { // ALL LINKS DISPLAY HERE print_r($item); // NOW YOU ADD IN YOU DATABASE AND MAKE A LOOP TO ENGINE NEVER STOP } } maybe help you
Use CURL Functions to get data and manipulate the data as you need And get Links. For more information: cattechnologies.com
We are developing a search engine. for that we are in need of code for web crawler or something related to that, for getting automatically the websites, without manually entering it to database.
Avoid using simple_html_dom for crawling. It takes a lot of memory and the script crashes. Custom crawler using regex is the best.
HI all, i need a simple crawl script in php which to fetch the category,image,description,keywords,title,meta,price,mrp of a ecommerce website.. and store it in mysql database....so please reply me