I'm an administrator on MPGH, a large forum running vBulletin + vBSEO. We have two Quad Core dedicated servers, one dedicated for PHP (litespeed), one dedicated for MySQL. Right now we have had 1600+ guests and members on in the last 30 minutes. Our Server Load Averages 57.99 59.24 58.09 Right now, the bottleneck is the PHP server, the MySQL server is delivering the content fast, but the PHP server's processes shoot up to 100% CPU usage, then die out. We are considering upgrading to a Xeon server for the PHP, but I'm not so sure the PHP server is acting correctly, is this normal? Is there anything we can do to reduce the load?
Make sure you have some sort of PHP caching system installed on the web server. There is definitely as issue with loads like that.
Do you have free memory, is the server swapping? When you run top, do you see whether the load is from "waiting" or something else? Also try sar -u 2, this is from the sysstat package.
top - 21:14:31 up 32 days, 23:39, 1 user, load average: 59.96, 57.14, 53.96 Tasks: 190 total, 61 running, 126 sleeping, 3 stopped, 0 zombie Cpu(s): 94.2%us, 4.0%sy, 0.0%ni, 0.5%id, 0.0%wa, 0.3%hi, 1.0%si, 0.0%st Mem: 3361384k total, 2190180k used, 1171204k free, 131016k buffers Swap: 4875716k total, 46440k used, 4829276k free, 1114128k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 15698 nobody 20 0 156m 19m 11m R 14.1 0.6 0:05.27 lsphp5 15744 nobody 20 0 157m 20m 10m S 13.1 0.6 0:04.49 lsphp5 15700 nobody 20 0 159m 22m 10m R 10.1 0.7 0:05.09 lsphp5 15750 nobody 20 0 158m 20m 10m R 10.1 0.6 0:04.46 lsphp5 15499 nobody 20 0 160m 25m 12m S 9.1 0.8 0:07.87 lsphp5 15509 nobody 20 0 160m 22m 10m S 9.1 0.7 0:07.58 lsphp5 15799 nobody 20 0 160m 21m 9280 R 9.1 0.7 0:02.59 lsphp5 15491 nobody 20 0 160m 24m 12m S 8.1 0.8 0:07.89 lsphp5 15505 nobody 20 0 160m 23m 10m S 8.1 0.7 0:08.47 lsphp5 15516 nobody 20 0 161m 25m 11m R 8.1 0.8 0:08.28 lsphp5 15522 nobody 20 0 160m 23m 10m S 8.1 0.7 0:07.74 lsphp5 15551 nobody 20 0 157m 24m 13m R 8.1 0.7 0:07.97 lsphp5 15752 nobody 20 0 157m 20m 10m R 8.1 0.6 0:04.93 lsphp5 15791 nobody 20 0 156m 18m 10m R 8.1 0.6 0:04.33 lsphp5 15511 nobody 20 0 157m 20m 11m R 7.1 0.6 0:07.90 lsphp5 15515 nobody 20 0 160m 27m 14m S 7.1 0.8 0:08.21 lsphp5 15535 nobody 20 0 159m 22m 11m R 7.1 0.7 0:07.89 lsphp5 15557 nobody 20 0 158m 22m 11m R 7.1 0.7 0:07.56 lsphp5 15615 nobody 20 0 156m 21m 12m R 7.1 0.6 0:07.28 lsphp5 15749 nobody 20 0 157m 20m 10m R 7.1 0.6 0:04.46 lsphp5 15751 nobody 20 0 157m 21m 11m R 7.1 0.7 0:04.91 lsphp5 15798 nobody 20 0 159m 21m 9984 R 7.1 0.7 0:02.55 lsphp5 15809 nobody 20 0 156m 17m 8496 R 7.1 0.5 0:01.79 lsphp5 15870 nobody 20 0 159m 19m 8680 R 7.1 0.6 0:00.40 lsphp5 15521 nobody 20 0 160m 23m 10m S 6.0 0.7 0:08.18 lsphp5 15524 nobody 20 0 160m 22m 9.8m R 6.0 0.7 0:06.59 lsphp5 15525 nobody 20 0 161m 26m 12m S 6.0 0.8 0:08.06 lsphp5 15586 nobody 20 0 159m 24m 11m R 6.0 0.7 0:07.92 lsphp5 15616 nobody 20 0 160m 22m 9.9m R 6.0 0.7 0:06.74 lsphp5 15657 nobody 20 0 157m 21m 11m R 6.0 0.7 0:07.22 lsphp5 15738 nobody 20 0 157m 21m 11m R 6.0 0.7 0:04.67 lsphp5 15739 nobody 20 0 157m 18m 9344 R 6.0 0.6 0:04.62 lsphp5 15740 nobody 20 0 157m 20m 10m R 6.0 0.6 0:05.14 lsphp5 15743 nobody 20 0 157m 19m 9.9m R 6.0 0.6 0:04.57 lsphp5 15801 nobody 20 0 157m 19m 10m R 6.0 0.6 0:02.28 lsphp5 15518 nobody 20 0 161m 26m 12m S 5.0 0.8 0:07.72 lsphp5 15519 nobody 20 0 160m 25m 12m S 5.0 0.8 0:07.79 lsphp5 15520 nobody 20 0 159m 21m 10m R 5.0 0.7 0:07.90 lsphp5 15526 nobody 20 0 160m 23m 10m R 5.0 0.7 0:08.24 lsphp5 15528 nobody 20 0 159m 22m 10m R 5.0 0.7 0:07.81 lsphp5 15546 nobody 20 0 160m 23m 10m R 5.0 0.7 0:06.83 lsphp5 15699 nobody 20 0 160m 22m 10m R 5.0 0.7 0:04.51 lsphp5 15701 nobody 20 0 160m 22m 9.8m R 5.0 0.7 0:05.08 lsphp5 15755 nobody 20 0 157m 19m 10m R 5.0 0.6 0:04.54 lsphp5 15790 nobody 20 0 159m 21m 9.8m R 5.0 0.6 0:04.06 lsphp5 15792 nobody 20 0 160m 23m 10m R 5.0 0.7 0:04.00 lsphp5 15800 nobody 20 0 160m 21m 8832 R 5.0 0.7 0:02.16 lsphp5 15805 nobody 20 0 158m 20m 9488 R 5.0 0.6 0:01.67 lsphp5
Can you check either in the logs or with netstat -na | grep :80 | sort whether you have many connections from one IP? You might be a victim of a DoS attack. If this is the case, you can install mod_evasive to downplay the attack.
We often get DDoSed, but I do not believe this to be one. tcp 0 0 93.190.140.127:80 125.166.93.210:11308 TIME_WAIT tcp 0 0 93.190.140.127:80 125.166.93.210:11309 TIME_WAIT tcp 0 0 93.190.140.127:80 125.166.93.210:11310 TIME_WAIT tcp 0 0 93.190.140.127:80 125.166.93.210:11311 TIME_WAIT tcp 0 0 93.190.140.127:80 125.166.93.210:11321 TIME_WAIT tcp 0 0 93.190.140.127:80 125.166.93.210:11322 TIME_WAIT tcp 0 0 93.190.140.127:80 131.191.81.102:55166 TIME_WAIT tcp 0 0 93.190.140.127:80 131.191.81.102:55604 TIME_WAIT tcp 0 0 93.190.140.127:80 131.191.81.102:55609 TIME_WAIT tcp 0 0 93.190.140.127:80 165.228.190.83:65127 FIN_WAIT2 tcp 0 0 93.190.140.127:80 173.174.51.254:4296 ESTABLISHED tcp 0 0 93.190.140.127:80 173.174.51.254:4365 TIME_WAIT tcp 0 0 93.190.140.127:80 173.174.51.254:4367 ESTABLISHED tcp 0 0 93.190.140.127:80 173.174.51.254:4368 ESTABLISHED tcp 0 0 93.190.140.127:80 173.174.51.254:4369 FIN_WAIT2 tcp 0 0 93.190.140.127:80 173.174.51.254:4370 ESTABLISHED tcp 0 0 93.190.140.127:80 173.2.98.190:52436 TIME_WAIT tcp 0 0 93.190.140.127:80 173.34.225.234:53613 TIME_WAIT tcp 0 0 93.190.140.127:80 173.34.225.234:53675 ESTABLISHED tcp 0 0 93.190.140.127:80 173.34.225.234:53697 TIME_WAIT tcp 0 0 93.190.140.127:80 173.34.225.234:53709 ESTABLISHED tcp 0 0 93.190.140.127:80 173.34.225.234:53710 TIME_WAIT tcp 0 0 93.190.140.127:80 173.34.225.234:53712 ESTABLISHED tcp 0 0 93.190.140.127:80 173.34.225.234:53756 ESTABLISHED tcp 0 0 93.190.140.127:80 173.34.225.234:53762 ESTABLISHED tcp 0 0 93.190.140.127:80 174.34.156.130:35711 ESTABLISHED tcp 0 0 93.190.140.127:80 174.92.135.44:4398 ESTABLISHED tcp 0 0 93.190.140.127:80 186.58.133.202:21587 TIME_WAIT tcp 0 0 93.190.140.127:80 186.58.133.202:2628 ESTABLISHED tcp 0 0 93.190.140.127:80 186.58.133.202:29230 TIME_WAIT tcp 0 0 93.190.140.127:80 186.58.133.202:30769 FIN_WAIT2 tcp 0 0 93.190.140.127:80 186.58.133.202:35529 ESTABLISHED tcp 0 0 93.190.140.127:80 186.58.133.202:37042 ESTABLISHED tcp 0 0 93.190.140.127:80 186.58.133.202:41924 TIME_WAIT tcp 0 0 93.190.140.127:80 186.58.133.202:4495 ESTABLISHED tcp 0 0 93.190.140.127:80 186.58.133.202:50886 TIME_WAIT tcp 0 0 93.190.140.127:80 186.58.133.202:51094 TIME_WAIT tcp 0 0 93.190.140.127:80 186.58.133.202:5405 ESTABLISHED tcp 0 0 93.190.140.127:80 186.58.133.202:55024 TIME_WAIT tcp 0 0 93.190.140.127:80 186.58.133.202:55104 TIME_WAIT tcp 0 0 93.190.140.127:80 186.58.133.202:57327 TIME_WAIT tcp 0 0 93.190.140.127:80 186.58.133.202:60305 TIME_WAIT tcp 0 0 93.190.140.127:80 186.58.133.202:61595 TIME_WAIT tcp 0 0 93.190.140.127:80 186.58.133.202:62560 TIME_WAIT tcp 0 0 93.190.140.127:80 186.58.133.202:64906 TIME_WAIT tcp 0 0 93.190.140.127:80 186.58.133.202:6684 ESTABLISHED tcp 0 0 93.190.140.127:80 186.58.133.202:8025 FIN_WAIT2 tcp 0 0 93.190.140.127:80 187.126.29.173:60165 TIME_WAIT tcp 0 0 93.190.140.127:80 187.126.29.173:60226 TIME_WAIT tcp 0 0 93.190.140.127:80 187.126.29.173:60236 TIME_WAIT tcp 0 0 93.190.140.127:80 187.126.29.173:60237 TIME_WAIT tcp 0 0 93.190.140.127:80 187.159.25.21:54699 FIN_WAIT2 tcp 0 0 93.190.140.127:80 187.35.236.226:57584 TIME_WAIT tcp 0 0 93.190.140.127:80 189.110.244.127:11680 ESTABLISHED tcp 0 0 93.190.140.127:80 189.175.3.10:1249 ESTABLISHED tcp 0 0 93.190.140.127:80 189.182.151.180:50545 ESTABLISHED tcp 0 0 93.190.140.127:80 189.46.4.141:51093 ESTABLISHED tcp 0 0 93.190.140.127:80 189.46.4.141:51094 ESTABLISHED tcp 0 0 93.190.140.127:80 189.69.151.219:2325 FIN_WAIT2 tcp 0 0 93.190.140.127:80 190.34.9.72:4659 TIME_WAIT tcp 0 0 93.190.140.127:80 190.34.9.72:4662 TIME_WAIT tcp 0 0 93.190.140.127:80 190.34.9.72:4663 ESTABLISHED tcp 0 0 93.190.140.127:80 192.71.148.10:32498 TIME_WAIT tcp 0 0 93.190.140.127:80 192.71.148.10:35757 TIME_WAIT tcp 0 0 93.190.140.127:80 192.71.148.10:35763 TIME_WAIT tcp 0 0 93.190.140.127:80 192.71.148.10:36740 TIME_WAIT tcp 0 0 93.190.140.127:80 193.126.159.58:62701 FIN_WAIT2 tcp 0 0 93.190.140.127:80 193.126.159.58:62707 FIN_WAIT2 tcp 0 0 93.190.140.127:80 193.126.159.58:62708 FIN_WAIT2 tcp 0 0 93.190.140.127:80 193.126.159.58:62709 ESTABLISHED tcp 0 0 93.190.140.127:80 193.126.159.58:62710 FIN_WAIT2 tcp 0 0 93.190.140.127:80 193.126.159.58:62711 ESTABLISHED tcp 0 0 93.190.140.127:80 193.126.159.58:62720 ESTABLISHED tcp 0 0 93.190.140.127:80 198.53.161.147:60253 TIME_WAIT tcp 0 0 93.190.140.127:80 200.103.128.97:61147 TIME_WAIT AND [root@j5 ~]# sar -u 2 Linux 2.6.28.2 (j5) 05/08/2010 09:28:05 PM CPU %user %nice %system %iowait %steal %idle 09:28:07 PM all 93.89 0.00 5.86 0.00 0.00 0.25 Average: all 93.89 0.00 5.86 0.00 0.00 0.25
Ok, I asked the other administrator, we do have a DDoS prevention system already installed. We have a firewall limit set at 50 connections per user, do we need to lower it?
If you pasted all lines from netstat in the post above, then it definitely doesn't look like DoS, so there is no need to touch the limits.
Ok, I'm assuming there is nothing that can be done. So I'll be upgrading. However there is a disagreement between us on which to upgrade to. Upgrade the server to? - Core i7 (8 Cores) - Xeon (4 Cores) - Dual Xeon (Turn both servers into one, 2x4 Cores) Which would be the logical choice?
IMO, upgrading will not help. A software based DDOS deterrent system is only good for low end attacks. You need a hardware based system in order to mitigate large scale attacks, assuming that is what it is. With the setup you currently have, you should be able to handle those loads with ease! Take a look through your log files, try tracing some of those processes and figure out what they are doing. I would also try disabling your VB plugins for a bit, and see if the load comes down to a normal level, and if it does, start enabling them one by one to see which one is the culprit. I think upgrading right now is just throwing money away, you need to troubleshoot what the issue is.
It's not just the DDoS, the server lags on the weekend. Generally, when there are over 1200+ people on during a 30 minute period, our server slows down to load average of about 10.
His problem doesn't seem to be related to the disks or RAM. His server doesn't swap and there is not much IO activity. It seems like his Apache / PHP configuration needs to be optimized.
We are upgrading to VB4 today, I think, without any extensions. This should help determine if the issue is related to plugins (99% sure it isn't, we tried this previously).