Looking for basic information on building crawlers and bots in php

Discussion in 'PHP' started by srisen2, Feb 7, 2011.

  1. #1
    Hey everyone I was interested in building some crawlers and bots in php. Im not sure where to start does anyone have any good resources or any suggestions on how to approach a project like this for the first time?
     
    srisen2, Feb 7, 2011 IP
  2. mastermunj

    mastermunj Well-Known Member

    Messages:
    687
    Likes Received:
    13
    Best Answers:
    0
    Trophy Points:
    110
    #2
    Learn cURL, regular expression and rest you need logic to create a bot. There is no science or definition to create a crawler / bot.
     
    mastermunj, Feb 7, 2011 IP
  3. clonepal

    clonepal Active Member

    Messages:
    128
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    73
    #3
    try first with this code, play a little with it:

    
    <?php
      $original_file = file_get_contents("http://www.domain.com");
      $stripped_file = strip_tags($original_file, "<a>");
      preg_match_all("/<a(?:[^>]*)href=\"([^\"]*)\"(?:[^>]*)>(?:[^<]*)<\/a>/is", $stripped_file, $matches);
    
      //DEBUGGING
    
      //$matches[0] now contains the complete A tags; ex: <a href="link">text</a>
      //$matches[1] now contains only the HREFs in the A tags; ex: link
    
      header("Content-type: text/plain"); //Set the content type to plain text so the print below is easy to read!
      print_r($matches); //View the array to see if it worked
    ?>
    
    Code (markup):
     
    clonepal, Feb 7, 2011 IP
  4. srisen2

    srisen2 Peon

    Messages:
    359
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    0
    #4
    Thanks for the help.

    what about how to target multiple websites and link hop rather than simply know the domain or url you want to query
     
    srisen2, Feb 7, 2011 IP