I wanted to try and do this, and I have written a spider in PHP that is currently making its way through DP It has been running for about 15 minutes now; and to tell you the truth I cannot believe the script/server hasn't put a timeout on it yet; nor firefox. I can tell it goes going through a small phase of "duplicate links" right now as it is sort of sitting idle. Give it 5 minutes or so though I cannot tell how many links it has indexed; however I am echoing 1 line to the browser per link unique link found and the scrollbar is about the size of a cm or so. Hmm, I will let you know if it actually finishes or not (I am going to let it run over night; though i suspect my server may crash before that happens (not due to the script; the server seems to have some uptime issues (going to contact support)). IF this really can index DP, then I have more then impressed myself, and it will be onto "fake multithreading". If I can index DP at a decent speed with 1 worker; imagine if I write the script to have 10; 20; 50 workers What a server load!! Anways, will keep u guys posted and this just goes to show that anything can be accomplished if you put your mind to it
The only strange thing I notice right now though; I am picking up a few links that look like this: http://forums.digitalpoint.com/showthread.php?amp;goto=newpost&t=178166 with the multiple amp; things going on. Part of my script also can tell me which page the spider was at before getting the new one; but I turned it off so i can't see where its coming from so I can't tell if it is a bug or something wacky with DP If this thing ever finishes i'll run it again with the output changed and see where it is coming from...