My site http://www.dealhandler.com basically gathers all the deals from groupon, kgbdeals, and the likes everything, checking every hour. So it's pretty obvious that my database is going to get huge. Currently it already have over 100k deals. I'm not an mysql expert and would like to see how others have handle large databases. Here's how I had my tables setup. I have a main "Deal" table where I store ALL the deal along with their info such as start, end, price, description, link, and a few other info. Then I have a table "CurrentDeals" which tracks active deals (not expired yet). This table only has a foreign key pointing to the Deal table. I have a script that runs hourly to check for expired deals and remove them from the Current Deals table. I figure as the Deals table get large, it shouldn't affect the load time as much since I have all the Ids I care about in CurrentDeals. Any one see problems with this implementation? Any suggestion for improvements?
As long as you design the database properly, it should be fine... We have 225,000,000+ records in a single table for this tool without any issues: http://tools.digitalpoint.com/tracker.php Also, Facebook uses MySQL for the bulk of their stuff. If you plan on getting huge, you should definitely look at memcache for caching stuff (also something Facebook uses extensively) as well as Sphinx for searching.
Thanks Shawn, that makes me feel a bit better. As for memcache, I did try to install it last week. But somehow it slowed all my sites down and took over 1 minute to load compare to a few seconds. I'm on a VM host with around 512mb of ram. I tried debugging it for 20 minutes but without much improvements. But after disabling memcache and rebooting the server, the load time was back to normal. Maybe it's some setting that I'm overlooking. But since I host multiple domain on this server, it's hard to play with memcache without affecting the live websites.
Memcache should only be used if you have multiple web servers serving a site. If you aren't yet to that level, there are local server-specific caching mechanisms you can use (XCache's get/set functions for example).
Just make sure all your indices are set up properly, and use them properly. Sphinx has it's limits for searching, which ultimately pushed me away from it. I'm using solr now, and while it's quirky here and there, I love it. To give you an idea of what mysql can do, I would pretty much just worry about the hardware on which your database resides.