PHP spider

Discussion in 'PHP' started by iamsgf, Nov 13, 2008.

  1. #1
    Hi People........

    I am looking for a bit of opensource PHP that will act as a spider for a project I am working on.

    Basically, I want to be able to put a URL into my system and it will go out and spider the site and get all the internal URLs (ie page links) for the site.

    I was going to just phrase site maps, but as there are several different types, and then you have to identify if the site uses a multi page site map etc etc... it appears easier to go down the route of a spider! although I know it will be more process/bandwidth intensive, I believe it would be the best option to ensure that I get everything and alot less coding required
     
    iamsgf, Nov 13, 2008 IP
  2. rene7705

    rene7705 Peon

    Messages:
    233
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    0
    #2
    i'd use wget for the spidering itself, then grep to get all the href's
     
    rene7705, Nov 13, 2008 IP
  3. Barti1987

    Barti1987 Well-Known Member

    Messages:
    2,703
    Likes Received:
    115
    Best Answers:
    0
    Trophy Points:
    185
  4. joebert

    joebert Well-Known Member

    Messages:
    2,150
    Likes Received:
    88
    Best Answers:
    0
    Trophy Points:
    145
    #4
    Spidering the pages uses the exact same concepts as spidering the sitemaps.
    You're making things harder than they need to be, the sitemaps syntax is much simpler than an HTML files & will be easier to parse in the long run.
     
    joebert, Nov 13, 2008 IP
  5. sarahk

    sarahk iTamer Staff

    Messages:
    28,934
    Likes Received:
    4,563
    Best Answers:
    124
    Trophy Points:
    665
    #5
    if phpDig is still around it would do that easily.
     
    sarahk, Nov 14, 2008 IP