1. Advertising
    y u no do it?

    Advertising (learn more)

    Advertise virtually anything here, with CPM banner ads, CPM email ads and CPC contextual links. You can target relevant areas of the site and show ads based on geographical location of the user if you wish.

    Starts at just $1 per CPM or $0.10 per CPC.

Best Database for Billions of Records / Terabytes of Data? (Search Engine)

Discussion in 'MySQL' started by Nathan Malone, May 25, 2006.

  1. #1
    I need some sort of database that would be able to handle billions of records / terabytes of data, while being reasonably fast and spread over several machines.

    I was looking at MySQL Cluster, but if I understand it correctly, it just won't work in this situation, as I think that only Memory databases are "clusterable", and this database would require more then just a few gigs of memory.

    Does anyone have any suggestions on where to go? FYI, this will be for a search engine. Any other advice on writing a search engine is welcome.
     
    Nathan Malone, May 25, 2006 IP
  2. ServerUnion

    ServerUnion Peon

    Messages:
    3,611
    Likes Received:
    296
    Best Answers:
    0
    Trophy Points:
    0
    #2
    MS SQL Server or Oracle would be able to handle that in a large cluster. Be warned, this will be far from a cheap setup.

    For a venture like this, I suggest getting your info away from the forums. Contact some hardware/datacenters/consultants that would have a better view into how this type of setup would need to be configured.
     
    ServerUnion, May 25, 2006 IP
  3. digitalpoint

    digitalpoint Overlord of no one Staff

    Messages:
    38,333
    Likes Received:
    2,613
    Best Answers:
    462
    Trophy Points:
    710
    Digital Goods:
    29
    #3
    digitalpoint, May 25, 2006 IP
  4. Nathan Malone

    Nathan Malone Well-Known Member

    Messages:
    369
    Likes Received:
    15
    Best Answers:
    0
    Trophy Points:
    110
    #4
    That's interesting, perhaps MySQL Cluster will work after all.

    I suppose you are correct, though, that going custom will probably be the best (only?) long-term solution. If you (or anyone else) have any advice on developing a custom database engine, I'd be interested in hearing them.
     
    Nathan Malone, May 25, 2006 IP
    TheHoff likes this.
  5. Lisper

    Lisper Guest

    Messages:
    86
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    0
    #5
    Do yourself a favor and do not use MySql for this. It is a horrible database - for example it cannot even guarantee your data won't corrupt on a power loss. If you want a free / opensource database system postgresql would be a better choice.

    Building a custom database engine isn't very realistic or necessary, if you have powerful hardware a good database system such as SQL Server, Oracle or Postgresql will be able to handle your data.
     
    Lisper, May 25, 2006 IP
  6. pricethat

    pricethat Active Member

    Messages:
    10
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    86
    #6
    I would have to disagree with the above comment about mysql not being suitable. What you need more than anything is a database engineer, if you do not have enough understanding of databases that you have to ask people on a forum when you are talking terrabytes of data then i would not even begin to start the project.

    Mysql can handle terrabytes of information, i know because i have used it many times. But equally you do not have to use a single database, you can use many databases, sometimes on different machines to form a cluster of sorts, it really does depend on what the task is and how you develop the application using the database to begin with.

    Each database has its benefits and pitfalls, if you take a cross sample of lets say 1000 projects i would say that mysql is more suited than any of the others for various reasons. I myself need to migrate what we do from mysql to postgre because of other issues but there is nothing essentially wrong with mysql.

    Making the wrong decision now will cost you thousands and thousands later on, a few hundred spent on proper database architecture and design now will save you no end of problems.

    As for bespoke databases..................don't even go there, that is unless you think you have the skills to reinvent the wheel.
     
    pricethat, May 29, 2006 IP
  7. Lisper

    Lisper Guest

    Messages:
    86
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    0
    #7
    How about maintaining data integrity for one? I think that's an essential requirement for a database. But there are also lots of other reasons why Mysql isn't very good.

    I'm sorry but Mysql is a toy; the Visual Basic of databases. It was once the only open source / free database server but there are more and better choices right now.
     
    Lisper, May 29, 2006 IP
  8. topsearch

    topsearch Peon

    Messages:
    27
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #8
    Have you considered a database appliance such as Neteeza? They're wicked fast (and wicked expensive)
     
    topsearch, Jun 13, 2006 IP
  9. sketch

    sketch Well-Known Member

    Messages:
    898
    Likes Received:
    26
    Best Answers:
    0
    Trophy Points:
    148
    #9
    I'm pretty sure any system is prone to corruption if used hard enough. Lisper's example of corruption due to power failure is a hardware issue (the disk stopping in the middle of something), not because of MySQL.
     
    sketch, Jun 13, 2006 IP
  10. nevetS

    nevetS Evolving Dragon

    Messages:
    2,544
    Likes Received:
    211
    Best Answers:
    0
    Trophy Points:
    135
    #10
    A high speed solution involving terabytes of data is going to take a lot of work. It's not as simple a question as you may think. I've never used mySQL or postgres to those degrees, but I would quickly move away from those platforms before the discussion even began. MS SQL and Sybase would be knocked out early as well.

    Oracle is a solid platform, but I'd also look closely at DB2 and perhaps informix and I'm sure there are a few others in that vertical space that are deserving of mention, but the end goals are very important in making a decision. Building a custom data environment isn't necessarily a bad idea, depending on the project. Also a possibility would be a custom OS, or a slimmed down linux platform geared towards your database performance.

    Must handle terabytes of data is certainly a starting point to slim down your list, but I'd say a better way of looking at it would be to have a solid understanding of your desired end solution. How many tables? How many records per table? How many columns per table? What about indexes? Is there a potential for tablescans on tables containing tens of millions of records? Full-text indexing? What options are available for the platform you are intending to use?

    What about dedicated search indexing hardware/software? There are the google appliances and in that marketspace there are probably 10 or maybe 20 vendors that will hit various price points from a few grand to over a million bucks.

    The mysql full text search capabilities are questionable in my mind. I understand how it works, but the results are often less than optimal unless you plan extremely well.

    I think you are better off starting out with a free platform - mysql if your needs are truly basic, or postgres if you need a little bit more (does mysql support database triggers yet? I think it handles transactions now.). Put together a design that will handle the project on a smaller level. Test it and tweak it to the edge and then make a switch to a more robust platform once you have a real understanding of your requirements.
     
    nevetS, Jun 13, 2006 IP
  11. RectangleMan

    RectangleMan Notable Member

    Messages:
    2,825
    Likes Received:
    132
    Best Answers:
    0
    Trophy Points:
    210
    #11
    I agree that if the person is asking these questions in a forums they have a LOT of problems to look forward to. Best to hire a seriously good company to handle these things. What you are talking about is 6-figure solutions.
     
    RectangleMan, Jun 14, 2006 IP
  12. topsearch

    topsearch Peon

    Messages:
    27
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #12
    The company I currently work for has the terabyte data problem, and the only viable solution was to write custom software to handle it. No relational database could keep up. We had to do some pretty low level coding to get decent performance. If you find a RDBMS that can handle TB sized datamarts with reasonable efficiency, let me know....
     
    topsearch, Jun 17, 2006 IP
  13. Lisper

    Lisper Guest

    Messages:
    86
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    0
    #13
    No not any system: a good database system can handle power failure. You may lose transactions which aren't committed yet but your data should not be corrupted. Most database systems can guarantuee data integrity, Mysql cannot.
     
    Lisper, Jun 18, 2006 IP
  14. iconv

    iconv Well-Known Member

    Messages:
    189
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    108
    #14
    For a search engine, don't even think about using a relational database, any one for that matter; they are not designed for managing unstructured or semi-structured data (like web pages).
     
    iconv, Jun 25, 2006 IP
  15. muaythai

    muaythai Peon

    Messages:
    92
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #15
    If money is not an issue Oracle RAC would be a good solution :) I've used it with databases that wher several terabytes large and over multiple servers :)
    But on the other end I've used Oracle databses in terabyte size on a standalone server as well so it depends a lot of the database design and the application that are using it.
     
    muaythai, Jun 30, 2006 IP
  16. pstation

    pstation Active Member

    Messages:
    51
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    91
    #16
    This seems like a job for a good ol' mainframe
     
    pstation, Jul 4, 2006 IP
  17. bpearson

    bpearson Peon

    Messages:
    173
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    0
    #17

    how many rooms in size? I thank my datacenter has rooms just for them. lol
     
    bpearson, Jul 11, 2006 IP
  18. ccoonen

    ccoonen Well-Known Member

    Messages:
    1,606
    Likes Received:
    71
    Best Answers:
    0
    Trophy Points:
    160
    #18
    Definitely sounds like a job for Oracle or MsSQL. MsSQL 2K5 is pretty friggin powerful now if you want to go that route. I would not leave it in the hands of MySQL (nothing against the database, but it's not built to be a solid large-scale database). If you are forced to use a MySQL flavor of DB (in the cheaper side of things) I advise PostGres.
     
    ccoonen, Jul 11, 2006 IP
  19. timw

    timw Peon

    Messages:
    299
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    0
    #19
    Google uses a custom built DB system right?
     
    timw, Jul 24, 2006 IP
  20. bpearson

    bpearson Peon

    Messages:
    173
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    0