Hi, We have a website for which we want to set up the load balanced servers so that we have database being updated on all servers. I suppose we need to write our PHP code accordingly.. Does anyone knows abt that ?
What kind of servers are you running? You can do it through hardware if you have the money. It'd be faster than software, that's for sure. A lot more expensive though.
You have a few options, but exactly how to do it depends on exactly what you want to do. When you say 'database updated on all servers' do you mean that you are going to run a database server (like MySQL) on each web server and you would like them all to be synchronised? If that's the case then you need to look at whether the DB server software you are using supports replication. Before you get to that point though you need to look at the actual network architecture. Is it going to have a high load? An example of a high load is a web counter service where you can have millions of hits each day. Each hit will generate at least one database transaction ... possibly more ... so you need to consider that and design the load balancing side of it to suit. You can go with hardware which can be useful if you want to maintain user sessions on specific servers or, if it's just a case of evenly distributing the load you may be able to just set the DNS up in a round robin fashion where you have multiple 'A' records for the server address. More information needed I think. The best method to load balance your service is dependent on the nature of the service and the expected load (both web load and database load). Gary
There are really two issues here that you may need to deal with depending on what you are trying to balance. Are you trying to balance your http traffic, database traffic, or both? I'll use a 3 node example that illustrates doing so for two front end web servers and a dedicated database server... web-1, web-2, and db-1. As Gary said you first decide whether you want a hardware solution or a software based solution. This really is a question of $$$. At my $dayJob, we just switched to a hardware solution and we're using barracuda 640s (two of them) and they run about 12K a piece. Prior to this we used something far simpler and isn't "true" load balancing with round-robin dns. This works if your name server is BIND 8. BIND 9 does not support round-robin afaik, others may support it. There are also software packages like pound but I've never used them. Assuming you'll use round-robin to distribute traffic evenly between web-1 and web-2, both of which communicate with db-1 for database traffic, you simply define two A records for the same host name. hightraffic.mysite.com IN A 1.2.3.4 hightraffic.mysite.com IN A 1.2.3.5 If you did an nslookup on hightraffic.mysite.com you would see two results returned instead of one. The reason this works is because the order of the results is pseudo-random (usually pretty evenly distributed) so when the client's browser resolves, it gets a different IP each time. This introduces a problem however, and that problem is you now need to maintain session persistence. After the user logs in to your site, the server they are talking two will store their session information. When they go to the next page most browsers will perform the lookup AGAIN and return the IP of your other server. Thus they will be logged out because the second server knows not who they are because it doesn't have their session data. There are several approaches to solve this problem. The way we did it was to setup secondary unique hostnames for each node in the cluster. hightraffic.mysite.com IN A 1.2.3.4 hightraffic.mysite.com IN A 1.2.3.5 hightraffic-1.mysite.com IN A 1.2.3.4 hightraffic-2.mysite.com IN A 1.2.3.5 Then in apache for each node you setup two vhosts, one that answers to hightraffic and one that answers to hightraffic-x. The vhost for hightraffic does a redirect to it's unique name (hightraffic-x) and the vhost for hightraffic-x serves your application. Now, every request from that user after the initial one ( and what is in their browser's address bar) is hightraffic-x. Basically you have locked them to one node and they will work with that node for the duration of their entire visit. You can also perform this redirection in your application code if you wanted. Another way with our example would be to store session data in the database. Because it's common to both nodes, your users could bounce between web nodes each time and it wouldn't matter. Hardware solutions support session persistence out of the box. Ours supports Layer 4 ( Locks to node based on client IP ) or Layer7 ( using cookies ). This means you just add nodes behind it and it works... very very easy. If you are running a db server on each of your web nodes or you are trying to distribute load amongst your db servers as well, then that is another problem to solve entirely. If you are using MySQL you could use replication but their is a problem to overcome. Multi-master replication in MySQL is pretty crappy and I've never seen it work. That means you have to do your inserts to a single node (the master) and you can replicate to slaves. This is fine for a lot of websites because they are read heavy on the db side of things. It introduces problems though in that there can be delays in the replication causing temporary sync problems and it introduces a single point of failure. The other problem here is that unless you wrote the code that is running your website, chances are the application does not have an option to define a master database server that it will use for it's insert/update/deletes and a sepearte list for it's slave (read-only) db servers. You could use MySQL cluster which will solve those problems because it's a black box. You send queries, it distributes load on it's own. Trouble is that most applications execute queries with multiple joins which MySQL cluster can't handle and it needs a lot of hardware ( think RAM ). There are ways around these issues also but I've rambled long enough, especially for my first post. Hope that helps or at least spawns more questions.