How to Crawl URLs of a website using PHP?

Discussion in 'PHP' started by apsam29, Dec 26, 2008.

  1. #1
    I want to know, how to crawl URLs from one of my websites using PHP functions.
    Is there any pre-defined functions/methods to get them for my links collection database.
    If know any FreeWare packages bulit using PHP can make this for me?

    anyone pls help ???
     
    apsam29, Dec 26, 2008 IP
  2. tamen

    tamen Peon

    Messages:
    182
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    0
    #2
    Look into curl. It can fetch webpages for you and then you can use PHP to parse the html for any links in it.
    For the parsing you can use regular expressions.
     
    tamen, Dec 26, 2008 IP
  3. bradjmsu

    bradjmsu Peon

    Messages:
    1
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #3
    Sphider offers a search engine that you can easily set up to crawl a site and at least extract the keywords and URLs. You can download it at http://www.sphider.eu.
    There is also a version available for crawling a website with PHP behind a password (http://www.phpcodester.com/2011/04/using-sphider-to-crawl-a-password-protected-site) if you are interested in doing that.
     
    bradjmsu, Apr 29, 2011 IP
  4. lioncub5

    lioncub5 Peon

    Messages:
    27
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #4
    use curl to download the page and return it as an object.
    then use something like preg_match_all('@^(?:href =\")?([^\"/]+)@i') to parse the page and return the link.
    (i escaped the double quotes in my example, I'm not sure if that's right. also, the space may need to be encoded)
    but you get the idea.
     
    lioncub5, Apr 29, 2011 IP