Best Database for Billions of Records / Terabytes of Data? (Search Engine)

Discussion in 'MySQL' started by Nathan Malone, May 25, 2006.

  1. DanInManchester

    DanInManchester Active Member

    Messages:
    116
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    53
    #21
    From what I've read about existing search engines and massive databases like this what your really need is some sort of custom distributed appliance rather than a database server.

    The requirements on a database so large with many users would be phenominal. If you have a distributed appliance you can have lots of lower end hardware and distribute the load accross it more easily.

    You will gain econemies of scale this way as well as make it highly scalable.

    *edit*
    I'd imagine you could use google appliances in this way
     
    DanInManchester, Jul 24, 2006 IP
  2. dc dalton

    dc dalton Active Member

    Messages:
    521
    Likes Received:
    11
    Best Answers:
    0
    Trophy Points:
    58
    #22
    Yup the last post is right .... does anyone actually think a site like Google or MSN could stay up using a single off the shelf db server? Hardly!

    Sites like these have multiple redundant distributed database solutions that cost millions of dollars (if not more)

    As far as the original posters question I would look into Postgresql ... it is a seriously powerful solution that can more than likely handle your situtation. Now of course you are going to need someone who really knows there stuff when it comes ot load balancing and "trimming the fat" when it come to queries etc. The db is free but you are still gonna pay some big numbers for a serious db admin and developer .... if you cheap out there NO database will stand up to the torture!
     
    dc dalton, Jul 24, 2006 IP
  3. catanich

    catanich Peon

    Messages:
    1,921
    Likes Received:
    40
    Best Answers:
    0
    Trophy Points:
    0
    #23
    How about MS Access? ( Just kidding...)

    The choice is not the database, but the operating system. Therefore, there are only 2 real choices:

    Windows use MS SQL Server
    UNIX (type) use Oracle

    But you will find for the SE requirement, the UNIX system will be better because of the "indexing of large record count" and "number of concurrent users".

    Talk to your local Oracle rep for more details. SQL Server is good but not for a SE application.

    Jim Catanich
     
    catanich, Jul 26, 2006 IP
  4. abuzant

    abuzant Well-Known Member

    Messages:
    956
    Likes Received:
    45
    Best Answers:
    0
    Trophy Points:
    140
    #24
    If not looking to have somthing built specially for you, go Informix. If not, go Oracle.. Else, forget it.

    And as mentioned above, do not even take chances trying mySQL or MS SQL for as you said "TB of data".. they will die as a butterfly.

    Good luck.
     
    abuzant, Jul 27, 2006 IP
  5. wwm

    wwm Peon

    Messages:
    308
    Likes Received:
    10
    Best Answers:
    0
    Trophy Points:
    0
    #25
    look into MSSQL 2005 its excellent ;)
     
    wwm, Jul 31, 2006 IP
  6. djhall

    djhall Peon

    Messages:
    1
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #26
    If you are not tied to the idea of using a relational database to solve the problem how about the possibility of an Object Oriented Database?

    The Objectivity/DB product has customers arround the world that regularly build multi-terabyte repositories with many billions of objects. Also, the objects can be simple or highly complex. The product fully supports inheritance.

    On Objectivity database can be created on one machine or distributed across thousands of machines and the API is such that the applications don't have to have knowledge of the distribution.

    The product allows you to create your own persistent collections (maps, trees, ontologies, etc.) to organize the data in a way the makes the most sense for your environment.

    It also supports multithreading so you can have thousands of threads doing writes and/or updates at the same time reads are being performed.

    It has Java, C++, and Python APIs and the objects written using one language can be read by an application using any of the other APIs.

    Finally, it is fully ACID (Automicity, Consistency, Isolation, Durability).

    Hope this gives you some ideas. Good Luck.
     
    djhall, Aug 5, 2006 IP