Who sais PHP cannot be a decent spider ?

Discussion in 'PHP' started by drewbe121212, Nov 21, 2006.

  1. #1
    I wanted to try and do this, and I have written a spider in PHP that is currently making its way through DP :p It has been running for about 15 minutes now; and to tell you the truth I cannot believe the script/server hasn't put a timeout on it yet; nor firefox.

    I can tell it goes going through a small phase of "duplicate links" right now as it is sort of sitting idle. Give it 5 minutes or so though :p I cannot tell how many links it has indexed; however I am echoing 1 line to the browser per link unique link found and the scrollbar is about the size of a cm or so.

    Hmm, I will let you know if it actually finishes or not (I am going to let it run over night; though i suspect my server may crash before that happens (not due to the script; the server seems to have some uptime issues (going to contact support)).

    IF this really can index DP, then I have more then impressed myself, and it will be onto "fake multithreading". If I can index DP at a decent speed with 1 worker; imagine if I write the script to have 10; 20; 50 workers :) What a server load!!

    Anways, will keep u guys posted and this just goes to show that anything can be accomplished if you put your mind to it :)
     
    drewbe121212, Nov 21, 2006 IP
  2. drewbe121212

    drewbe121212 Well-Known Member

    Messages:
    733
    Likes Received:
    20
    Best Answers:
    0
    Trophy Points:
    125
    #2
    The only strange thing I notice right now though; I am picking up a few links that look like this:

    http://forums.digitalpoint.com/showthread.php?amp;goto=newpost&t=178166

    with the multiple amp; things going on. Part of my script also can tell me which page the spider was at before getting the new one; but I turned it off so i can't see where its coming from so I can't tell if it is a bug or something wacky with DP :p

    If this thing ever finishes i'll run it again with the output changed and see where it is coming from...
     
    drewbe121212, Nov 21, 2006 IP