Hi, I am interesting in building a web spider, so I would like to know the best program that I can use to build any kind of spiders. thanks
there is a lot of pre made spider ... no need to start for a new one . search google and you will find some
thanks commandos but I still want to know what is the best program because I want to offer costumizable spidering service on my site. Thanks
C or C++ would be the most efficient in terms of CPU resources but they're not programmer friendly. Java should be fine but I think the easiest is to hack it using PHP.
Perl is ideal for an application like this. I wrote the the feeds for my site from scratch using perl since I needed them to be flexible and have strong regex capabilities.
I really don't like Perl, but this is pretty much exactly the sort of thing it was designed to do. If you're down with OO programming, and you're planning on doing a lot of customization (meaning you're going to be coming back to do extra stuff 6 months from now, which is perl's weakness), python would also be an excellent choice. Java, VB.NET, and C# are all reasonable alternative choices. At least, those are the languages I'd consider. Probably more important than the language you use is having a good design. When it comes time to customize it, the more time you spent thinking about design up front, the easier it will be.
Stick with Java or C#, they're really the only languages you should even consider for something like this.
hi I created a webspider using VB & running a live site too . but spidering take too much cpu so i m not marketing it because i need fianace. The Free spider which u get in net are useless except zoom spider cost 49 or $99. Thanks
Perl was designed (or maybe evolved would be a better phrase) as a single-shot hacker's tool. Its core theme (as far as I can tell) is to whip out text manipulation programs in the fastest way possible. And it's great at that. A lot of its evolution also seems to also have revolved around introducing redundancy: letting the developer do any given task in as many different ways as anyone can dream up. Which is really cool. Combine those two observations, though, (along with a mindset in the community that leans toward meaningless, incomprehensible variable names), and you wind up with, in my mind, unmaintainable code. This could just be me. Somewhere along the line I may have just picked up an anti-perl mental block (it might even be genetic...maybe I was born with this flaw). Perl is obviously a great programming language, or it would not have enjoyed the success and wide-spread use that it has. So, instead of writing "which is perl's weakness," I should have written "which I consider perl's weakness."
I guess that I'm at the opposite end of the spectrum from you. Perl is usually my first option for any task since it's obviously my strongest programming language That and I've seen it do some really great things in my time (like run some of the largest and most trafficed websites in the world). Essentially the points that you touch on here are all user driven weaknesses. If you adhere to a certain style guide you can pretty much do anything with perl and it will be no more or less maintainable/extendable than any other programming language. So while it may have started life as a quick hacker tool to get jobs done on the quick, it is now an enterprise level language that can do just about anything.
Really? Are you saying that there's no point in writing a web spider in a language other than either Java or C#? On what are you basing this assumption? I almost wrote the same thing when I was tossing in my 2 cents. Then I remembered that VB.NET creates basically the same binary as C#. So you could use that instead. Most of the time's likely to be spent parsing the results you pull back. Perl's heavily optimized to do that, so it seems like an extremely viable option for this project. Pretty much anything you want to do is going to be easier to write in python than c# or java. So there's another option. If it turns out that performance is an issue, you can write an extension to do that part in C (or maybe first try a dll written in C#). It's more a pain to keep your code hidden (at least it used to be), but you could still just use it to wire things together, handle how the things are customized, etc. Heck, common lisp might be a perfect fit for this. A very brief google search even brought me to a web spider written in Ruby. That one strikes me as an odd choice (at least until ruby runs faster), but obviously someone believed it was worth considering. We don't know anywhere near enough about the details to be saying things like "the only languages you should even consider."
There's nothing odd. PHP is the most programmer friendly language (at least for now). Weak typing, huge user base, easily get help from gurus (like your good self), without the stigma of a wimp language like VB. I think it's great for rapid prototyping, for trying out new concepts.