Hey guys, Just wondering how search engine spiders work? And does it take a lot of effort to make one? Thanks!
For a really comprehensive read-up on spiders/crawlers have a look at http://en.wikipedia.org/wiki/Web_spider I found http://www.ibm.com/developerworks/linux/library/l-spider/?ca=dgr-lnxw01WebSpiderLinux really useful as it has *nix, ruby & a simple python spidering script plus links to source code in the "Resources" section, including some Java spiders as Blue Star Ent points out. Again, it would be nice to know how you get on
It is not too difficult to write one in php. you can do that in a couple of hours. However in order to have a proper spider you need also to handle robots.txt properly. In order to write one all you need is a database, curl and preg_match_all
I looked up Sphider, it's here > http://www.sphider.eu/ If people use it & find it useful, don't forget to donate (http://www.sphider.eu/donate.php)
Hey, thanks for gathering that link.. i guess i forgot to do that. Anywho hopefully that helps people
No problem at all, always happy to help out And thanks for pointing out Sphyder, I'll have a proper look at that later myself.
Not a problem either. I was into using a spider for a site but it did not do what i wanted so i created my own. The spider i created only crawls indepth of all links on that page. I was working on making it crawl the crawled links but anywho. Not bad for first php spider attempt.