Does anyone here have a powerful Server that could test a search engine to see how many pages/terms/documents/results and search engine could stand. Some search engines that you can download say they support 20,000 pages or 10,000 documents.. I'm wondering if any one wants to stress test a Search Engine. Or do you know of any body or group that could host a test of a search engine for a stress test. (note: I'm not trying to prove thoughs numbers but test a Search Engine which has never been tested)
No, you can't use mine . But 20K documents is chicken feed. Your desktop computer and home internet connection should be able to handle that. For contrast, I'm running a search engine on a machine with dual xeon, 8gigs of ram, and 3X300 gig HD's in Raid 0. When I crawl using nutch I have to cap the crawler so it doesn't exceed 20mbs - it'd likely do 30 or 40mbs or more if left unfettered. The crawl itself doesn't take much computing power. And IIRC, it'll download something like 100K documents or more per hour. Once the documents are downloaded, you have to build an index. We build our index in chunks or segments of 400K documents. Each segment takes a few hours to index and spikes the CPU cycles up to almost 100%. (caveat: I'm not sure how accurate that actually is as it's a dual processor machine with hyperthreading, so it has effectively 4 cpu's - but I'm not sure if my monitoring actually looks at all 4 cpus. Certainly when indexing it doesn't seem to slow the machine down). I know that's not what you're looking for, but hopefully that'll be a benchmark.
I've got two. www.mozdex.com which also provides a free opensearch feed (you can integrate the search feed into other aps - like the google api, but free). And I've got a Canadian search engine at www.acrosscan.com.