I'm trying to get this thing under control but so far am having no luck. I'm running a Joomla 1.5 site under apache 2.2, php 5.3.3, eaccelerator, zend, memcached/mod_memcache, and using fastcgi with worker. Server specs are: Intel core 2 quad q9400 2.66ghz 8gb ddr2 ram 64bit 1tb hdd Everything will be going smooth and fine, cpu load fluctuates between 0.5-3 depending, then all of a sudden out of nowhere it spikes to 40,50, or even 70. Looking at top I don't see anything eating that much cpu other than php, looking at error logs I don't see any scripts with fatal errors on an endless loop or anything. Spamd fails when the php spikes the cpu load and I get a system email notifying me. The cpu then seems to go back under control after a few minutes then we start the cycle all over again. I'm running clamav, mailscanner, configserver lsf, mod_security, and run maldetect once a day as well as run chkrootkit once in a while. We did have an issue a few months ago where I was playing with a new login component that turned out to have an exploit in it and we got hammered. Everything seems to be cleaned out, nothing shows in any scans and I don't see any offending scripts going through by hand but there's always the possibility there's something still residing in there somewhere undetected. I've gone through and turned off modules in Joomla trying to find the offending code but doesn't seem to matter, I've upgraded Joomla to the latest 1.5 release and it did seem to help a bit but still didn't stop the sudden spikes. I backed php down to 5.3.2 but still getting high cpu spikes with php hogging the majority. Here's what a snapshot from top looks like: Tasks: 191 total, 2 running, 189 sleeping, 0 stopped, 0 zombie Cpu(s): 34.4%us, 3.3%sy, 0.0%ni, 60.6%id, 0.9%wa, 0.1%hi, 0.8%si, 0.0%st Mem: 8177492k total, 5500940k used, 2676552k free, 52472k buffers Swap: 2096472k total, 49036k used, 2047436k free, 2965196k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 24164 oohyane 16 0 215m 63m 22m R 63.6 0.8 0:11.39 php 24150 oohyane 16 0 220m 68m 22m S 29.6 0.9 0:17.10 php 24170 oohyane 16 0 203m 51m 22m S 27.6 0.6 0:09.79 php 5185 nobody 18 0 441m 42m 2692 S 9.0 0.5 4:49.70 httpd 2718 mysql 15 0 611m 222m 4100 S 5.0 2.8 100:47.11 mysqld 5310 nobody 18 0 436m 38m 2708 S 2.3 0.5 4:56.47 httpd 5186 nobody 18 0 1249m 38m 2728 S 2.0 0.5 5:03.47 httpd 5272 nobody 18 0 432m 37m 2692 S 2.0 0.5 5:21.10 httpd 17024 nobody 18 0 431m 31m 2644 S 2.0 0.4 1:08.55 httpd 19580 nobody 18 0 364m 27m 2624 S 1.3 0.3 0:37.67 httpd 9978 nobody 18 0 436m 36m 2732 S 1.0 0.5 3:08.89 httpd 17059 nobody 18 0 363m 26m 2344 S 1.0 0.3 1:03.24 httpd 19841 nobody 18 0 358m 21m 2640 S 1.0 0.3 0:35.11 httpd 2056 root 10 -5 0 0 0 S 0.7 0.0 35:31.76 kondemand/2 17135 root 18 0 370m 69m 9.9m S 0.7 0.9 8:00.47 java 566 root 11 -5 0 0 0 S 0.3 0.0 3:08.29 kjournald 18221 root 20 0 118m 16m 1592 S 0.3 0.2 0:46.87 lfd I really need to get this figured out, I can't afford to keep spending the majority of my day messing with this instead of actually developing sites.
Does this spike happens once in a day OR often? If often, does it happen on the same time of the day? Have you checked if there is any cronjob that is causing the load spike?
Mmm... maybe a series of stupid question but: Is your dedi accessible via IP? Are you testing scripts and websites in this DEDI? Have you had a look at apache (or http server daemon) to see if you have some strange requestes? Maybe it can be a brute forcing attempt to discover MYSQL injection, or PHP XSS probings?
Have you setup php cache like APC or eaccelerator or xcache? it can help you reduce load on server. It looks like a lot of php process running on the server due to more number of requests. Cache can reduce the number of compiling.
1. worker for apache doesnt work very well at all. 2. id agree with anands, but would use xcache 3. id off mod_sec & use suhosin instead. 4. off the fastcgi manager and run php-fpm instead. 5. off apache & use nginx instead. To truly see whats causing the spikes, i would strace the pid's when it spikes. strace -p 24164 Ideally you wont be able to view the strace live as it will stream to fast, i would output the strace to a file if anything then read over it after wards.
Thanks for the replies so far, as I mentioned in the original post I am running eaccelerator as well as memcached/mod_memcache. I have Joomla set to use memcache as the cache option, and in php.ini I have memcached set to handle the php sessions. I've gone through and disabled extensions hoping to to pinpoint which extension could be causing the issue, but even with just the bare bones essential the cpu load will spike at times. I'm to the point of thinking the template I'm using may be the cause as it's originally built for 1.0 but I tweaked it to work with 1.5 when I migrated. I'm going to do a complete rebuild using a new template as soon as I get the database copied over to a sandbox test account. Csf is giving me some info that is telling me that processes suddenly run wild, over 93 at times when the cpu load spikes, and I'm also concerned with the mail setup on the account as exim, spamd, pop, and imap tend to fail when the cpu spikes. Here's some info from my csf, what do you guys think of what this is showing? Sep 6 16:51:25 server lfd[23375]: *Email Queue* The exim delivery queue size is 196092 Sep 6 16:58:50 server lfd[24042]: Directory Watching terminated after 22 seconds Sep 6 16:58:50 server lfd[24042]: LF_DIRWATCH taking 22 seconds, temporarily throttled to run every 360 seconds Sep 6 17:01:59 server lfd[24153]: *LOAD* 5 minute load average is 17.57, threshold is 6 - email sent Sep 6 17:04:29 server lfd[24235]: *Skipped File* /tmp/#sql_ab1_0.MYD - Too large to scan Sep 6 17:06:31 server lfd[24272]: *Excessive Processes* Userohyane Kill:0 Process Count:16 Sep 6 17:07:30 server lfd[24235]: Directory Watching terminated after 46 seconds Sep 6 17:07:30 server lfd[24235]: LF_DIRWATCH taking 46 seconds, temporarily throttled to run every 1080 seconds Sep 6 17:13:49 server lfd[25900]: 5 (sshd) login failures from 201.38.138.2 (BR/Brazil/-) in the last 300 secs - *Blocked in csf* Sep 6 17:14:34 server lfd[25981]: *SSH login* from 216.51.193.200 into the root account using password authentication Sep 6 17:51:39 server lfd[29370]: *Email Queue* The exim delivery queue size is 196099 Sep 6 18:02:04 server lfd[30624]: *LOAD* 5 minute load average is 13.37, threshold is 6 - email sent Sep 7 00:37:59 server lfd[6525]: 5 (mod_security) rule triggers from 67.83.75.157 (US/United States/ool-43534b9d.dyn.optonline.net) in the last 300 secs - *Blocked in csf* Sep 7 00:40:46 server lfd[6689]: *Email Queue* Unable to obtain exim_outgoing.conf queue length within 30 seconds - Timed out Sep 7 00:42:16 server lfd[6707]: *Skipped File* /tmp/#sql_ab1_0.MYD - Too large to scan Sep 7 00:46:19 server lfd[6881]: *Excessive Processes* Userohyane Kill:0 Process Count:16 Sep 7 00:49:04 server lfd[7825]: *LOAD* 5 minute load average is 23.80, threshold is 6 - email sent Sep 7 00:54:20 server lfd[8203]: *Skipped File* /tmp/#sql_ab1_0.MYD - Too large to scan Sep 7 00:58:24 server lfd[8515]: 5 (sshd) login failures from 122.72.31.130 (CN/China/-) in the last 300 secs - *Blocked in csf* Sep 7 01:49:33 server lfd[13940]: *LOAD* 5 minute load average is 11.07, threshold is 6 - email sent Sep 7 02:00:08 server lfd[14733]: *System Integrity* has detected modified file(s): /usr/bin/pure-pw /usr/bin/pure-pwconvert /usr/bin/pure-statsdecode /usr/sbin/exim /usr/sbin/exim_dbmbuild /usr/sbin/exim_dumpdb /usr/sbin/exim_fixdb /usr/sbin/exim_lock /usr/sbin/exim_tidydb /usr/sbin/pure-authd /usr/sbin/pure-ftpd /usr/sbin/pure-ftpwho /usr/sbin/pure-mrtginfo /usr/sbin/pure-quotacheck /usr/sbin/pure-uploadscript /usr/sbin/runq /usr/sbin/sendmail Sep 7 02:26:52 server lfd[16805]: *Excessive Processes* Userohyane Kill:0 Process Count:16 Sep 7 02:40:14 server lfd[18280]: *WHM root access* from 216.51.193.200 Sep 7 03:24:26 server lfd[21998]: *LOAD* 5 minute load average is 7.65, threshold is 6 - email sent Sep 7 03:51:26 server lfd[24104]: *Email Queue* Unable to obtain exim queue length within 30 seconds - Timed out Sep 7 03:53:11 server lfd[24171]: *Excessive Processes* Userohyane Kill:0 Process Count:93 Sep 7 03:54:11 server lfd[24188]: *User Processing* PID:23666 Kill:0 Userohyane VM:219(MB) EXE:/usr/bin/php CMD:/usr/bin/php Sep 7 03:54:40 server lfd[23996]: Directory Watching terminated after 46 seconds Sep 7 03:54:40 server lfd[23996]: LF_DIRWATCH taking 46 seconds, temporarily throttled to run every 1080 seconds Sep 7 03:55:11 server lfd[24361]: *User Processing* PID:22040 Kill:0 Userohyane VM:221(MB) EXE:/usr/bin/php CMD:/usr/bin/php
1st i'd start by chking into why this user is causing log entries like these: Excessive Processes* Userohyane Kill:0 Process Count:93 2nd id look into why sql is puking out, as well as creating tmp tables: Skipped File* /tmp/#sql_ab1_0.MYD - Too large to scan
There are so many parts here that are not helping you, A couple of suggestions. Your mail queue is 190K -> It could be the culprit. Every time the mail queue runs the whole server will bog down. Sort out the Mysql issues - at least tune it and know exactly what it is doing. Once you get a handle on these 2 items then move onto looking at Apache
I deleted all of the messages in the mail queue. Mysql seems to be running pretty well and not using up much resources, I've got that tuned down pretty tight at this point I think. The only thing I need to really tackle on the database side at this point is the indexes and making sure the scripts are taking advantage of them. The reason your seeing the mysql stuff being written to the tmp folder is due to caching, same reason csf gave the same notices about eaccelerator.so files being skipped for being too large before I set it to ignore it. I did just bump up my tmp table size in my.cnf just to ensure tmp tables aren't getting written to disk (saw 23k was written to disk in phpmyadmin status). It seems the major issue is around some bad php code somewhere causing php to eat large large large portions of the cpu resources, but not sure exactly what's causing it yet. I did narrow one issue down to sh404sef which is the url rewriting component I use which causes the following 500 error in debug mode: JDatabaseMySQL::query: 1064 - You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '' at line 2 SQL=SELECT username FROM jos_users WHERE id= Call stack # Function Location 1 JSite->render() /home/oohyane/public_html/index.php:79 2 JDocumentHTML->render() /home/oohyane/public_html/includes/application.php:168 3 JDocumentHTML->_parseTemplate() /home/oohyane/public_html/libraries/joomla/document/html/html.php:249 4 JDocumentHTML->getBuffer() /home/oohyane/public_html/libraries/joomla/document/html/html.php:386 5 JDocumentRendererModules->render() /home/oohyane/public_html/libraries/joomla/document/html/html.php:190 6 JDocumentRendererModule->render() /home/oohyane/public_html/libraries/joomla/document/html/renderer/modules.php:41 7 JModuleHelper->renderModule() /home/oohyane/public_html/libraries/joomla/document/html/renderer/module.php:84 8 require() /home/oohyane/public_html/libraries/joomla/application/module/helper.php:173 9 cbPluginHandler->trigger() /home/oohyane/public_html/modules/mod_cblogin/mod_cblogin.php:460 10 cbPluginHandler->call() /home/oohyane/public_html/administrator/components/com_comprofiler/plugin.class.php:509 11 call_user_func_array() /home/oohyane/public_html/administrator/components/com_comprofiler/plugin.class.php:551 12 getprofilebookTab->onAfterLogoutForm() 13 cbTabHandler->_getAbsURLwithParam() /home/oohyane/public_html/components/com_comprofiler/plugin/user/plug_cbprofilebook/cb.profilebook.php:169 14 cbSef() /home/oohyane/public_html/administrator/components/com_comprofiler/plugin.class.php:3072 15 CBframework->cbSef() /home/oohyane/public_html/administrator/components/com_comprofiler/plugin.foundation.php:2469 16 call_user_func_array() /home/oohyane/public_html/administrator/components/com_comprofiler/plugin.foundation.php:2121 17 JRoute::_() 18 shRouter->build() /home/oohyane/public_html/libraries/joomla/methods.php:54 19 JRouter->build() /home/oohyane/public_html/plugins/system/shsef.php:250 20 shRouter->_buildSefRoute() /home/oohyane/public_html/libraries/joomla/application/router.php:167 21 shSefRelToAbs() /home/oohyane/public_html/plugins/system/shsef.php:405 22 sef_404->create() /home/oohyane/public_html/administrator/components/com_sh404sef/sh404sef.class.php:1665 23 include() /home/oohyane/public_html/components/com_sh404sef/sef_ext.php:300 24 JDatabaseMySQL->loadResult() /home/oohyane/public_html/components/com_sh404sef/sef_ext/com_comprofiler.php:184 25 JDatabaseMySQL->query() /home/oohyane/public_html/libraries/joomla/database/database/mysql.php:355 26 JError->raiseError() /home/oohyane/public_html/libraries/joomla/database/database/mysql.php:231 27 JError->raise() /home/oohyane/public_html/libraries/joomla/error/error.php:171 28 JException->__construct() /home/oohyane/public_html/libraries/joomla/error/error.php:136 I upgraded sh404sef to the latest build and it seems to have helped the cpu load significantly, however I'm still getting temporary high spikes. I'm going to try running with sh404sef completely disabled for a few hours and see what the cpu load looks like, I hate to do it as it really messes with my search engine ranking running regular dynamic urls instead of the sef ones as it looks like dup content to them, but hopefully it wont get picked up for just a few hours. After sh404sef my next place to look is going to be community builder, don't really see any real errors from it without sh404sef enabled but that's the biggest component to the site so want to go through it completely.
I upgraded sh404sef as well as community builder and even though the spikes aren't as high or long lasting they're still occurring. I identified an integration issue between these two components and brought it to the sh404sef teams attention and they're releasing a new version with a fix in the next few days. However, even with that fix I don't think this is going to completely solve the cpu load issues. I still think something else is causing php to run off and eat resources.
I went back to prefork instead of worker and I saw a tremendous increase in performance, I'm guessing some extension could multi-thread and ended up hurting performance. I also turned off gzip in the joomla backend and went to using mod_deflate, seemed to knock a couple of seconds off of the page load speed as well. Load is currently running in the .45 range for the last couple of hours with no spikes and the site just "feels" smoother and faster now. Not sure if this was the magic cure but I think it's at least one more big step in the right direction. ** Been a few more hours and no load spikes what so ever, just checked again and it was at .09! Lol I didn't know that was even possible with this site with everything it has going on. I can't even begin to explain how happy I am now lol.
After several hours I did get a spike in load for a couple of minutes. After a bit of investigating I noticed the /tmp /var/tmp folders were 100% full. Since eaccelerator is set to write to the /tmp directory by default it's clear I need to change the cache dir path for it to prevent it from filling up the /tmp directory (/tmp is symlinked to /var/tmp). The only problem here is I installed eaccelerator through easyapache and this is apparently quite different than installing it manually. It puts the eaccelerator.ini file in /home/cpeasyapache/src/eaccelerator/eaccelerator-0.9.6.1 instead of where most of the tutorials and manuals tell you it should be. When I change the values in eaccelerator.ini here it doesn't do anything, simply looking at my phpinfo tells me that it's not changing at all. I also have to values in the php.ini file for eaccelerator so I can't change it there either. Anyone know how to change the eaccelerator values when it's installed through easyapache?