I am working on a REALLY large project using PHP/MySQL. Does anyone have tips on how to insure that a database intensive PHP application scales effectively? I notice that Facebook.com uses PHP on a massive scale. Cloud computing? MemCached? Apache? Lightspeed? Thoughts?
There's no single way to scale every application since each one presents it's own challenges. Facebook and Twitter are both sites that have to scale to a huge user base, but the database structure for each is entirely different. I know facebook's setup has always been a huge cluster of servers, each running memcache. Not sure what twitter does though. The easiest thing to do is scale vertically before you scale horizontally. That is, buy a more powerful server before you buy two reasonably powerful servers and have to rewrite your application to scape to more than one machine. If you think about horizontal scaling early on, it will hinder your development and prevent you from getting a working project out the door.
Don't use the database, if it is not really necessary. This is the main part of optimization. So this means writing data into a cachefile with serialize() and read it again with unserialize(file_get_contents()) as long the data is static for a longer time. like user profiles, categories, etc. In the most big projects is the fulltext search the main problem, so here it is necessary to cache the data. look at google. they cache all results. On the server itself you can think about xCache/APC or using squid proxy to fasten the output of php.
There's a lot more to it than simply using a proxy or caching mechanism. Can you explain what type or usage you're anticipating, specifically web traffic, and database usage (read or write intensive). Are you running any huge reports or doing any major compilations, etc... Are pages anticipated to be static enough that caching would even work? What sort of budget do you have. What bandwidth to the internet do you have available? Are you looking for, shared hosting, a single server setup, a cluster, etc...
Thanks to all that have chimed in. I appreciate the feedback! This program is not graphic intense. We are on the second version and it runs so slow. We have maxed out our dedicated server from the standpoint of what the hosting provider offers. The hosting provider says that we use less than 1% of bandwidth, storage, memory and CPU. This system uses extensive database usage (read and write intensive). Our main issue is how slow the system currently runs and we need to scale to at least 5 times the current capacity within 12 months. I have 5 PHP/MySQL programmers on staff full time. CS grads with 3 to 5 years experience after college. Any help is much appreciated!!!
It seems like they'd be a better way to get this question answered then asking this forum, when no one here knows the details of your program.
I'm looking for suggestions to pass to my team. So far from poking around I've learned about MemCached and Database Sharding. Neither of which my programmers knew about. Anything would help to save time and money. My team doesn't need to reinvent the wheel. For example, what would be good options aside from using Apache for a very large, database intensive application with well over 250 simultaneous database accesses? + Reputation for good tips.
If your bottleneck is caused by a database, replacing apache with a lightweight solution won't solve any of your scalability issues. De-normalizing your database can make a huge impact if you have to do a lot of joins for your queries, but it's an ugly solution. Memcache is going to be your best bet.
Based on what you have posted thus far, I would definitely be leaning towards a well optimized InnoDB, MySQL database. The last thing you want is table locking on a busy database, so I would stay away from MyISAM. I would also recommend that simply for data integrity reasons. When you say "250 simultaneous database accesses?", do you mean 250 unique users, or 250 transactions? Your solution to this may simply be hardware. Assuming 250 users with a few transactions each, could easily be 2000 per second, which is a formidable volume if it's consistent. If this is the case, there's going to be no choice but to use a very solid, well designed, dedicated server with a lot of RAM. I don't think clustering would be necessary at this point, but table partitioning may help if you have some very large tables. Next step would be a multi-master replication, or a single-master - multi-slave replication setup. Clustering would be a last resort, as you are forced to use the NDB storage engine, which is lacking in some necessary features like foreign keys support. As far as memcached, that or another front-end caching system will greatly reduce load. However, if every page/query is unique, caching wont solve anything, it will just create more load. I personally like APC better than memcached just because it is so much easier to install and configure. This may end up being a developer's choice pick. A squid reverse proxy would be another good front-end caching mechanism.