Adding a Spider to a directory?

Ernster Peon

Messages:: 535

Likes Received:: 38

Best Answers:: 0

Trophy Points:: 0

#1

I was thinking about adding a spider/bot to my directory where it will just spider the internal pages of websites that I have accepted in to my directory.

I have no clue how to do this, can anyone help me? Thanks.

Ernster, Aug 10, 2005 IP

Nitin M White/Gray/Black Hat

Messages:: 640

Likes Received:: 93

Best Answers:: 0

Trophy Points:: 0

#2

Do you have some programming skills? There may be some tools that work totally out of the box without programming but the ones we've been evaluating all require a fair amount of coding expertise to get working well.

You'll need to integrate 2 different components: 1) a spider to collect and index the pages, and 2) a search interface to allow the end user to submit queries against the collection.

Almost always the spider will come with the search but just poining out it will likely take some effort in both areas to pull off what you're attempting.

We've been working with many of these types of tools and weren't impressed with any of the freebies out there. On the real money side of the house here are 3 we really, really like:

dtsearch: $999 per domain I think (www.dtsearch.com)

ISYS: $10k per server (www.isys-search.com)

DocLinx: $10k per CPU (www.doclinx.com)

Spiders and indexing have been my life for several months now so feel free to follow up with me as you get more into it. Good luck.

Nitin M, Aug 10, 2005 IP

dvduval Notable Member

Messages:: 3,369

Likes Received:: 356

Best Answers:: 1

Trophy Points:: 260

#3

My directory script has this built in to spider the links and validate them.
Are you trying to actually capture a cache of the pages, or just validate?

dvduval, Aug 10, 2005 IP

Ernster Peon

Messages:: 535

Likes Received:: 38

Best Answers:: 0

Trophy Points:: 0

#4

I have no programming skills at all and would not be able to pay a programmer to do this for me. I dont even understand what Id do If I bought a script or how it works, I just really would like to make this happen with my directory. I am using WSN Links for my directory. Id like a FREE or a very cheap solution to this.

Any additional help would be apreciated (I will also ask the WSN Links forum).

Ernster, Aug 10, 2005 IP

dvduval Notable Member

Messages:: 3,369

Likes Received:: 356

Best Answers:: 1

Trophy Points:: 260

#5

Well, we have converted several WSN Links over to php link directory. Take a look and let me know if it has the features you need. You never answered my question. Are just trying to validate the links to make sure they work? If yes, this is built into the admin area, and you can validate for the whole site or individual categories.

dvduval, Aug 10, 2005 IP

Ernster Peon

Messages:: 535

Likes Received:: 38

Best Answers:: 0

Trophy Points:: 0

#6

Sorry. No I dont think I am. Ill try to explain it better.

When someone submits a site to my directory (just the main url) I review it and accept it, no probs there...

What I want to do is have a spider, crawl every accepted site and than automatically submit the internal pages to my directory.

Ernster, Aug 10, 2005 IP

dvduval Notable Member

Messages:: 3,369

Likes Received:: 356

Best Answers:: 1

Trophy Points:: 260

#7

Ok, I got you. That is no simple request. It would easily run in the hundreds of dollars, because there are so many considerations, including but not limited to:
1) Session IDs in urls
2) Depth of crawl
3) domain name with and without www
4) Bandwidth and care to only consume so much at a time
5) Link Validation
6) Respecting robots.txt
7) Respecting "no follow" attribute
Once you start running a crawler, it becomes a full time job, and a crawl engineer could add more to my list surely. At this point, I am not aware of any "simple" spiders, except where you are pulling down just one page. If anyone know of a "simple" spider that crawls a site, let me know.

One final note: It might be possible to do something simple like set the limit to 10 pages, and just grab the URL and page title if it exists, but even then there will be quality concerns.

dvduval, Aug 11, 2005 IP

egdcltd Peon

Messages:: 691

Likes Received:: 14

Best Answers:: 0

Trophy Points:: 0

#8

Not sure if this is what you are looking for, but fluid dynamics do something you might be interested in. Here is the URL: http://www.xav.com/scripts/search/

egdcltd, Aug 11, 2005 IP

Log in or Sign up

Advertising (learn more)

Adding a Spider to a directory?

Ernster Peon

Nitin M White/Gray/Black Hat

dvduval Notable Member

Ernster Peon

dvduval Notable Member

Ernster Peon

dvduval Notable Member

egdcltd Peon

Log in or Sign up

Advertising (learn more)

Adding a Spider to a directory?

Ernster Peon

Nitin M White/Gray/Black Hat

dvduval Notable Member

Ernster Peon

dvduval Notable Member

Ernster Peon

dvduval Notable Member

egdcltd Peon

Useful Searches