I'm curious if anyone here has any thoughts as to the practical size for a given mysql database. I'm looking at a very slim row size, but about 1 million records to start, potentially 3 million. Those kind of numbers always make me start thinking about alternative methods for data storage - simply because non-indexed queries can take forever on any database platform - but in this instance I am controlling the queries, and the columns are few enough that I could potentially index all columns. I could re-think things at this point, but I really wouldn't be surprised if mySQL could deal with this situation pretty easily.
3 million rows isn't that many for any "real" database. I wouldn't create indexes on every column, you should use them more specifically. The data type is important as is the nature of the data when it comes to query speed.
Good point. The nature of the application - statistical data processing really would necessitate indexes across all columns, but that's just because I'm looking at things from 24 different perspectives. Now that I've had some time to think about things, I'm very much leaning towards using the Berkeley DB engine. I know about all the complaints regarding that platform, but they are not so much a worry in this particular situation. My main concern is performance, and you can't beat Berkeley for performance if you have a solid understanding of your data and what you want to do with it.
i've got about 20 million rows in my aol data search in my signature and it returns results in sub seconds usually. A well designed MySQL database can handle a lot of data. Look into the different methods of indexing the data depending on what you want to do with it. We were planning to use oracle for an upcoming project, but our recent experience with large MySQL databases has changed our mind. If you have tons of data you can always look at clustering as well.
I think mySQL is more than acceptable for stats although I've never used it for such purposes. One thing to consider with respectto your indexes is that if this is logging a lot stats therefore doing lots of inserts and updates I really would use indexes sparingly as they have a cost asociated with them when it comes to updates and inserts. If your primary function where you require speed and responsive commands is the logging and updating of stats (inserts/updates) use minimal aropriate triggers. I'm usually prepared to wait for a stats report but users are prepared to wait for slow pages. There is also a cost of not having indexes especially if you are running large or complicated reports during peak times. This is why I try to shedule reports or run them against a secondary copy or data wharehouse.
Agreed. Indexes are only going to help if you are pulling from the data frequently. If youare going to be doing more inserts and such they can and will slow down the database slightly/
I've bee trying to find a breaking point for MySQL databases, I've pushed a few individual DB's to well beyond 1G with ~50M rows.... with a few dozen DB's on the server so far it still purrs on an old P4/1.2ghz box with only 768M RAM.
It can handle terrabytes but will fall over long before then if you are trying to run it on a single box! As a developer if I was responsible for such a VLDB I would be looking to employ a serious DBA who know my database platform inside out and could anticipate and deal with every problem whilst squeazing the post out of the hardware available.
m/cluster is a great clustering solution for MySQL if you need to run across multiple machines. MySQL also has clustering built in. Older versions required MASSIVE amounts of ram, but they are finally moving toward disk based clustering.
I wonder how many of you used clustering method on your own purpose. "8 Tb db , working on 2 Servers with db cluster method". Problem in MySQL is that after a point it is uselless . So i started on creating a diffrent que method but hopeless in the deadline. Server clustering is not cheap that is a fact but its worth it could see the ratio on the speed of query..
depends on how you do it. We use many inexpensive servers rather than a couple expensive workhorses. Custs our costs by a bunch, and the speed is incredible.
I use the technique of only using the server dp operations which is connected to the main server for only working for other operations (Hosting Co.) Main Server : Xeon 3.2(Tetra) 16 Gb Ram Db Server : Xeon 3.2(Dual) 16 Gb Ram Main Server - Db Server - Backup Server All connected to each other ,