Hello everyone I am in the process of making my project and stuck up right now at this point . I need to make a web crawler in PHP and link it to my site . Its really urgent for me . please can somebody suggest me a solution out of this problem . I would be really grateful to you ..! please friends help me out .! waiting for a reply.! Thanx. You can easily find me on: Gmail : Yahoo : -- Shekhar
$output = file_get_contents('http://www.website.com'); will assign the source code of a website to a variable. Then you can search it with string functions or regular expressions.
Here is a good tutorial to get you started. It uses cURL to get the page. It then uses preg_match to get all the links on the page and follows each link. You can check it out at http://kevinmusselman.com/blog/2009/11/crawling-web-pages-for-sitemaps/
First though, you should study how crawlers work before making it in PHP. If you are trying to crawl other websites, you should first learn cURL which is a good php extension that you can use to browse other sites through php. Reading the links might be trick thoguh because you have to see if there was a nofollow or not but just take it on step at a time. If you are trying to crawl your own website, it is just best to make a program that generates tags from your content. Or, even quicker dont reinvent the wheel but use a framework. You might want to see this site: http://www.bitrepository.com/how-to-create-a-simple-web-data-extractor.html