I need to build a web spider

Discussion in 'Programming' started by romioaa, Apr 19, 2007.

  1. #1
    Hi,

    I am interesting in building a web spider, so I would like to know the best program that I can use to build any kind of spiders. thanks
     
    romioaa, Apr 19, 2007 IP
  2. commandos

    commandos Notable Member

    Messages:
    3,648
    Likes Received:
    329
    Best Answers:
    0
    Trophy Points:
    280
    #2
    there is a lot of pre made spider ... no need to start for a new one .

    search google and you will find some ;)
     
    commandos, Apr 19, 2007 IP
  3. romioaa

    romioaa Notable Member

    Messages:
    1,368
    Likes Received:
    44
    Best Answers:
    0
    Trophy Points:
    200
    #3
    thanks commandos but I still want to know what is the best program because I want to offer costumizable spidering service on my site.

    Thanks
     
    romioaa, Apr 20, 2007 IP
  4. Weizheng

    Weizheng Peon

    Messages:
    93
    Likes Received:
    6
    Best Answers:
    0
    Trophy Points:
    0
    #4
    C or C++ would be the most efficient in terms of CPU resources but they're not programmer friendly.

    Java should be fine but I think the easiest is to hack it using PHP.
     
    Weizheng, Apr 20, 2007 IP
  5. officialboss

    officialboss Peon

    Messages:
    17
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #5
    You can also write in C#. I have worked on a console application that spiders the web.
     
    officialboss, Apr 21, 2007 IP
  6. Jack700

    Jack700 Guest

    Messages:
    50
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    0
    #6
    Perl is ideal for an application like this.

    I wrote the the feeds for my site from scratch using perl since I needed them to be flexible and have strong regex capabilities.
     
    Jack700, Apr 21, 2007 IP
  7. krakjoe

    krakjoe Well-Known Member

    Messages:
    1,795
    Likes Received:
    141
    Best Answers:
    0
    Trophy Points:
    135
    #7
    You say some really odd things ......
     
    krakjoe, Apr 21, 2007 IP
  8. Aragorn

    Aragorn Peon

    Messages:
    1,491
    Likes Received:
    72
    Best Answers:
    1
    Trophy Points:
    0
    #8
    Thought I have not much experience programming with perl, this is what I too have to say.
     
    Aragorn, Apr 21, 2007 IP
  9. jimrthy

    jimrthy Guest

    Messages:
    283
    Likes Received:
    13
    Best Answers:
    0
    Trophy Points:
    0
    #9
    I really don't like Perl, but this is pretty much exactly the sort of thing it was designed to do.

    If you're down with OO programming, and you're planning on doing a lot of customization (meaning you're going to be coming back to do extra stuff 6 months from now, which is perl's weakness), python would also be an excellent choice.

    Java, VB.NET, and C# are all reasonable alternative choices.

    At least, those are the languages I'd consider.

    Probably more important than the language you use is having a good design. When it comes time to customize it, the more time you spent thinking about design up front, the easier it will be.
     
    jimrthy, Apr 21, 2007 IP
  10. Jack700

    Jack700 Guest

    Messages:
    50
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    0
    #10
    Could you expand on this a bit? I don't quite understand how this is perls weakness.
     
    Jack700, Apr 21, 2007 IP
  11. prophecy

    prophecy Guest

    Messages:
    47
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #11
    Stick with Java or C#, they're really the only languages you should even consider for something like this.
     
    prophecy, Apr 22, 2007 IP
  12. johnwoof

    johnwoof Peon

    Messages:
    911
    Likes Received:
    11
    Best Answers:
    0
    Trophy Points:
    0
    #12
    hi

    I created a webspider using VB & running a live site too .

    but spidering take too much cpu so i m not marketing it because i need fianace.

    The Free spider which u get in net are useless except zoom spider cost 49 or $99.

    Thanks
     
    johnwoof, Apr 22, 2007 IP
  13. ottodo

    ottodo Guest

    Messages:
    2,055
    Likes Received:
    70
    Best Answers:
    0
    Trophy Points:
    0
    #13
    what exactly do you want it for?
     
    ottodo, Apr 22, 2007 IP
  14. jimrthy

    jimrthy Guest

    Messages:
    283
    Likes Received:
    13
    Best Answers:
    0
    Trophy Points:
    0
    #14
    Perl was designed (or maybe evolved would be a better phrase) as a single-shot hacker's tool. Its core theme (as far as I can tell) is to whip out text manipulation programs in the fastest way possible. And it's great at that.

    A lot of its evolution also seems to also have revolved around introducing redundancy: letting the developer do any given task in as many different ways as anyone can dream up. Which is really cool.

    Combine those two observations, though, (along with a mindset in the community that leans toward meaningless, incomprehensible variable names), and you wind up with, in my mind, unmaintainable code.

    This could just be me. Somewhere along the line I may have just picked up an anti-perl mental block (it might even be genetic...maybe I was born with this flaw). Perl is obviously a great programming language, or it would not have enjoyed the success and wide-spread use that it has.

    So, instead of writing "which is perl's weakness," I should have written "which I consider perl's weakness."
     
    jimrthy, Apr 22, 2007 IP
  15. Jack700

    Jack700 Guest

    Messages:
    50
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    0
    #15
    I guess that I'm at the opposite end of the spectrum from you. Perl is usually my first option for any task since it's obviously my strongest programming language :) That and I've seen it do some really great things in my time (like run some of the largest and most trafficed websites in the world).

    Essentially the points that you touch on here are all user driven weaknesses. If you adhere to a certain style guide you can pretty much do anything with perl and it will be no more or less maintainable/extendable than any other programming language.

    So while it may have started life as a quick hacker tool to get jobs done on the quick, it is now an enterprise level language that can do just about anything.
     
    Jack700, Apr 22, 2007 IP
  16. jimrthy

    jimrthy Guest

    Messages:
    283
    Likes Received:
    13
    Best Answers:
    0
    Trophy Points:
    0
    #16
    Really? Are you saying that there's no point in writing a web spider in a language other than either Java or C#? On what are you basing this assumption?

    I almost wrote the same thing when I was tossing in my 2 cents. Then I remembered that VB.NET creates basically the same binary as C#. So you could use that instead.

    Most of the time's likely to be spent parsing the results you pull back. Perl's heavily optimized to do that, so it seems like an extremely viable option for this project.

    Pretty much anything you want to do is going to be easier to write in python than c# or java. So there's another option. If it turns out that performance is an issue, you can write an extension to do that part in C (or maybe first try a dll written in C#). It's more a pain to keep your code hidden (at least it used to be), but you could still just use it to wire things together, handle how the things are customized, etc.

    Heck, common lisp might be a perfect fit for this.

    A very brief google search even brought me to a web spider written in Ruby. That one strikes me as an odd choice (at least until ruby runs faster), but obviously someone believed it was worth considering.

    We don't know anywhere near enough about the details to be saying things like "the only languages you should even consider."
     
    jimrthy, Apr 22, 2007 IP
  17. Weizheng

    Weizheng Peon

    Messages:
    93
    Likes Received:
    6
    Best Answers:
    0
    Trophy Points:
    0
    #17
    There's nothing odd. PHP is the most programmer friendly language (at least for now). Weak typing, huge user base, easily get help from gurus (like your good self), without the stigma of a wimp language like VB.

    I think it's great for rapid prototyping, for trying out new concepts.
     
    Weizheng, Apr 23, 2007 IP