Hey everyone I was interested in building some crawlers and bots in php. Im not sure where to start does anyone have any good resources or any suggestions on how to approach a project like this for the first time?
Learn cURL, regular expression and rest you need logic to create a bot. There is no science or definition to create a crawler / bot.
try first with this code, play a little with it: <?php $original_file = file_get_contents("http://www.domain.com"); $stripped_file = strip_tags($original_file, "<a>"); preg_match_all("/<a(?:[^>]*)href=\"([^\"]*)\"(?:[^>]*)>(?:[^<]*)<\/a>/is", $stripped_file, $matches); //DEBUGGING //$matches[0] now contains the complete A tags; ex: <a href="link">text</a> //$matches[1] now contains only the HREFs in the A tags; ex: link header("Content-type: text/plain"); //Set the content type to plain text so the print below is easy to read! print_r($matches); //View the array to see if it worked ?> Code (markup):
Thanks for the help. what about how to target multiple websites and link hop rather than simply know the domain or url you want to query