How to make your own search engine?

Discussion in 'All Other Search Engines' started by Coopzie, Sep 4, 2006.

  1. #1
    I have wondered this for a while, does any1 know a good script/application that allows you to make your own power custom search engine?
     
    Coopzie, Sep 4, 2006 IP
  2. sarahk

    sarahk iTamer Staff

    Messages:
    28,810
    Likes Received:
    4,535
    Best Answers:
    123
    Trophy Points:
    665
    #2
    Go to sourceforge and do a search there. I played around with phpDig for a bit, there's an english version of the site somewhere but this is all I could quickly find: http://phpdig.de/
     
    sarahk, Sep 4, 2006 IP
    iceberg likes this.
  3. wheel

    wheel Peon

    Messages:
    477
    Likes Received:
    19
    Best Answers:
    0
    Trophy Points:
    0
    #3
    There's two classes of search engines. The tiny ones that don't index much, and the 'power' ones that will crawl millions of pages.

    For smaller stuff, htdig as noted above is one. There's another older one who's name escapes me - aspsearch? something like that. Done by some russians, apparently languishing and unsupported but I've spoke to folks that like it. These products will not index huge numbers of websites so they're more suitable for a niche search engine IMO. If you're looking to index a few hundred thousand pages these are the best products. More than that and the software won't work.

    For a full scale search engine the only OSS product I know of is nutch. I use this in a few places and it works well. It's not anywhere near as easy as php scripts to set up and run but it will crawl and index on a large scale. Certainly it will do 50 million pages at least.

    If you take the second route, your two biggest issues will be servers (you won't run this on shared hosting) and crawling the web. I throttle my crawls back, but I can and do open up a 40mbs connection and fill it, wide open, for days at a time when I'm crawling. You could get by on a 10mbs connection - but you'd better have a good host who'll let you download enormous volumes of data. Once the data's downloaded, it takes a lot of horsepower to index all of it. I've got a heavy duty server that does that job, and it uses every bit of it.
     
    wheel, Sep 9, 2006 IP
  4. Blogspotter

    Blogspotter Notable Member

    Messages:
    2,327
    Likes Received:
    285
    Best Answers:
    0
    Trophy Points:
    205
    #4
    If you know some programming, search for lucene in google. Should give you an idea.
     
    Blogspotter, Sep 9, 2006 IP
  5. wheel

    wheel Peon

    Messages:
    477
    Likes Received:
    19
    Best Answers:
    0
    Trophy Points:
    0
    #5
    Lucene is the search engine included in nutch. Nutch adds on the crawler and a few other things that take lucene from straight search to a what most would consider a full search engine.
     
    wheel, Sep 9, 2006 IP
  6. R-ampage

    R-ampage Banned

    Messages:
    596
    Likes Received:
    15
    Best Answers:
    0
    Trophy Points:
    0
    #6
    English version - http://www.phpdig.net/
     
    R-ampage, Sep 10, 2006 IP
  7. misohoni

    misohoni Notable Member

    Messages:
    1,717
    Likes Received:
    32
    Best Answers:
    0
    Trophy Points:
    200
    #7
    I've seen the russian search code also, looked great - if not a bit daunting...shame I lost the link though...
     
    misohoni, Sep 10, 2006 IP
  8. mahmood

    mahmood Guest

    Messages:
    1,228
    Likes Received:
    43
    Best Answers:
    0
    Trophy Points:
    0
    #8
    If you are not looking for a proper search engine you should find some out there especially if your pages are few.

    A while ago I decided to put a search engine on my site but got disappointed because I expected to find a script that gives suggestions and not just a plain text search.

    For example lets say somebody enters "choclate", it should suggest "chocolate".

    The best option I found was "Google search API" but the problem was that google has to have all the pages indexed otherwise it only returns results from indexed pages.
     
    mahmood, Sep 10, 2006 IP
  9. kingsley

    kingsley Member

    Messages:
    98
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    41
    #9
    how much will it cost you to have this search engine and grow it big like google. i want to have one for Africa.
    kingsley
     
    kingsley, Sep 10, 2006 IP
  10. sanjeevan_a

    sanjeevan_a Peon

    Messages:
    42
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    0
    #10
    there are some very good open source search engines, for example http://lucene.apache.org/nutch/
     
    sanjeevan_a, Sep 10, 2006 IP
  11. brandnewx

    brandnewx Peon

    Messages:
    988
    Likes Received:
    28
    Best Answers:
    0
    Trophy Points:
    0
    #11
    If you want to build an empire like google, you should build from scratch, because sooner or later you'll stumble upon bugs that urge for immediate fix. So it's 3rd-party component you're using, and even it's open-source, you'll have hard time fixing it or asking the vendors to fix it. You fix it yourself, you'll have to learn the framework, or if you ask the vedors to fix it, you'll have to wait for unspecified, unpromised timeframe.

    I'm not saying 3rd-party engines are bad but just not for mass-crawling operations especially for commercial use. If you just want to build small search engines, they are probably fine.

    Ok, building SE from scratch requires capital and skill. If you cannot code the engine yourself, you will have to throw at least $2,000 to hire freelancers to build the engine, and many thousands more for maintanance, not to mention thousands a month for testing and hosting.

    In brief, unless you have a very good reason to run a SE, don't. If you just want a search functionality added to your website, search for "google search box" or "google site search integration"
     
    brandnewx, Sep 10, 2006 IP