If anyone can lead me to solving this, I will paypal $15 to the person who can help me kill this problem.. I can not give out root details, but I can follow directions and commands.. A few times daily, my server stops responding.. Pages don't load on ALL domains hosted on my dedicated server.. I wait a few minutes and it's back up... During the down time, the server pings fine... I contacted my dedicated server company several times and they keep trying things that don't work, and they say it's my ISP. It's not my ISP, as I can surf fine on all other sites when it happens.. Server Swap done by the host Memory Tested fine (they say) Apache logs show nothing wrong, messages show nothing wrong, and other logs are clean.. If someone is willing, I can follow instructions, and send them info such as php info, and whatnot.. edit.. forgot to mention, its a linux dedicated Cpanel bot Thanks
Sounds like a load spike due to maybe a cronjob running? I'd try being in SSH when it happens and being in top and see what sort of load average it's at when this happens.
login directly using SSH !! open n5( five ) ssh connection on your bash/shell to your server then run on 4 of the connections on your remote server the following - ONE line on each shell/connection: tail -f /var/log/warn tail -f /var/log/messages tail -f /var/log/apache2/error_log tail -f /var/log/apache2/access_log then on your 5th connection - that's your "working" connection to enter bash commands. then first check your apache config - u did NOT mention which apache version - 1.3, 2.0 or 2.2 to check apache config - typcially for 2.2. on a suse 10.1 linux it would be: rcapache2 configtest if reply OK then run next: rcapache2 extreme-configtest if any of above NOT OK - then follow instructions and clean up your apache config if OK then do rcapache2 reload that is a soft-restart of apache and should give you in the tail -f /var/log/apache2/error_log a line similar to: [Tue Nov 27 23:30:03 2007] [notice] Graceful restart requested, doing restart [Tue Nov 27 23:30:07 2007] [notice] Apache configured -- resuming normal operations as you then can see the "downtime" during a soft restart of apache is in the range of 4 seconds. if your apache does a correct / timely reload then do a full restart as follows rcapache2 restart look again at your tail -f /var/log/apache2/error_log the downtime is a few seconds longer if any of above 2 reload/restart commands create a delayed restart of your apache as you experience it in your normal operations - then look at the details of the 3 OTHER tail -f commands / widows - check for the seconds during and after your tests to get any possible feedback on possible problems I had once a non-starting apache a year ago - the result was a ln - s from the normal access_log file to an additional working copy for life traffic monitoring by a monitoring SW via browser. the "ln -s" link caused apache to stay down after the daily access_log rotation. think about WHAT SW or changes you installed / made just before this problem started to occur another check you can do is to compare the above 4 log files at the exact time of your reoccurring problems and look at the warn / messages or apache errors you find during the downtimes/problems you have!! if above give you no clue and no help to your problem, then you may have to wait for someone HERE in DP with more experience in that kind of problem to help you further. pls note that your apache reload/restart does NOT affect your SSH connection ! if however you would login to cpanel or similar - then your browser oriented connection might get lost = bad situation !!
Thanks for the outline Apache 1.3.39 tail -f /var/log/warn No such file in that directory tail -f /var/log/messages Got It tail -f /var/log/apache2/error_log Only path to apache error log is /usr/apache/logs/error_log tail -f /var/log/apache2/access_log Only path to apache error log is /usr/local/apache/logs/access_log rcapache2 configtest I can only use /usr/local/apache/bin/apachectl configtest rcapache2 extreme-configtest No extreme configtest available rcapache2 reload no reload option available thats pretty much where I stand.. Any advice?
what to do ? >> common sense! since you gave minimal system info = search until you find example: /var/log/warn every descent linux system has that warn-logfile find it and adapt path in your tail - command line like your Only path to apache error log is /usr/apache/logs/error_log etc !! same for OTHER missing commands - it you miss a particular command a.m. then research for your own equivalent - > use Google to get instant replies. if apache 2.2. installed - then most likely you may have similar or equal tools as a.m. at your disposal. so what's the output of /usr/local/apache/bin/apachectl configtest ?? and of your /usr/local/apache/bin/apachectl restart ?? since u update from apache 1.3 to 2.2 you may have a few minor problems specially may be using mod_rewrite we had ( me too ) such problems before and you find HERE in DP forum, topic apache (mod_rewrite) solutions for upgrade 1.3>2.x
config test results in syntax OK Apache restart [Tue Nov 27 16:38:55 2007] [notice] SIGHUP received. Attempting to restart [Tue Nov 27 16:38:56 2007] [notice] Apache/2.2.6 (Unix) mod_ssl/2.2.6 OpenSSL/0. 9.7a mod_auth_passthrough/2.1 FrontPage/5.0.2.2635 mod_bwlimited/1.4 configured -- resuming normal operations took a few seconds. no hangtime
1. i see u have upgraded to apache 2.2. fine 2. now observe and see if you still get your apache NOT restarting properly during certain daily situations best is to observe life using the tail -f commands a.m. 3. as said above use the 4 log files - specially the apache error_log look at the precise times of your downs - then see what errors occurred during previous down times exactly the seconds around your previous downtimes you may download your log files - all a.m. into your laptop for offline processing - for the periods you HAD known downtimes. 4. also as said b4 research when the first such occurrence appeared what did u change in your system ( config OR any link, symlink, new tool, new script installed, new site, etc ) immediately prior to the first occurrence as a.m. it happened to me once about a year ago, unfortunately i don't recall the exact details but something to the extend of writing a full access_log live into user-space of a domain account for real time processing of traffic by user that kept my apache2.2 down until manual restart
okie dokie... Thanks for the help hans, I will keep you posted.. It's hard to catch it when it stops responding as it happens a few times a day at random times..
even if you don't catch it life when it happens by looking at PREVIOUS incidents and searching ALL a.m. log files at the minutes / seconds before it happened, then you almost surely find the action triggering the down of apache but may be now with apache2 it's all OK who knows if you find the exact incident triggering your apache down time - also check crontab if there is a scheduled task among other factors that might trigger the problem. you HAVE to find the problem - unless its solved by upgrading now. else you risk downtimes=loss of $ for all sites. however since you upgraded from 1.3 to 2.2 - make sure ALL your previous apache features are fully working since there were a few changes in default configuration between 1.3 and 1.2. not sure if u have OTHER sites on your server as well or only your own site(s) - but test the mod_rewrite function in your new apache. it either works or does NOT at all. in my case latter after same upgrade - until i changed the apache config. and now as a final "homework" since you have another dist still unknown to us check all your apache commands to make sure you know your own functionalities http://httpd.apache.org/docs/2.2/programs/apachectl.html u see that my suse rcapache2 reload may find its equivalent in your apachectl graceful find out ALL your paths and names of the 4 ( four ) a.m. log files and know all apache commands of your current version/distribution. remember each time you change your apache config - you have to relaod that new config by using your apachectl graceful and BEFORE reloading a possibly faulty modified apache config always run your apachectl configtest if OK then do your graceful or restart - else correct first. changes in .htaccess are instantly active - changes in apache main config files usually only after a reload of config or restart of server. for future questions it would be much more helpful to give FULL details of your linux dist and main setup versions - even a regular link to your site in question may help to help you more directly and more efficiently
OK, it seems to just have happened.. messages are always filled with this, besides an occasional failed authentication try hy hack attempts.. Nov 28 09:10:59 jimbob kernel: ** RABHIT ** IN=eth0 OUT= MAC=00:10:dc:e2:cd:0f:00:18:19:cf:c1:f0:08:00 SRC=78.149.199.223 DST=69.72.214.58 LEN=40 TOS=0x00 PREC=0x00 TTL=53 ID=6071 PROTO=TCP SPT=113 DPT=59897 WINDOW=0 RES=0x00 ACK RST FIN URGP=0 Nov 28 09:25:42 jimbob kernel: ** RABHIT ** IN=eth0 OUT= MAC=00:10:dc:e2:cd:0f:00:18:19:cf:c1:f0:08:00 SRC=190.198.248.23 DST=69.72.214.58 LEN=40 TOS=0x00 PREC=0x00 TTL=56 ID=6016 PROTO=TCP SPT=113 DPT=41871 WINDOW=0 RES=0x00 ACK RST FIN URGP=0 Nov 28 09:41:35 jimbob kernel: ** RABHIT ** IN=eth0 OUT= MAC=00:10:dc:e2:cd:0f:00:18:19:cf:c1:f0:08:00 SRC=78.144.142.137 DST=69.72.214.58 LEN=40 TOS=0x00 PREC=0x00 TTL=52 ID=15585 PROTO=TCP SPT=113 DPT=53868 WINDOW=0 RES=0x00 ACK RST FIN URGP=0 Nov 28 09:44:10 jimbob kernel: ** RABHIT ** IN=eth0 OUT= MAC=00:10:dc:e2:cd:0f:00:18:19:cf:c1:f0:08:00 SRC=201.9.15.152 DST=69.72.214.58 LEN=40 TOS=0x00 PREC=0x00 TTL=56 ID=6267 PROTO=TCP SPT=113 DPT=33936 WINDOW=0 RES=0x00 ACK RST FIN URGP=0 Nov 28 09:44:29 jimbob kernel: ** RABHIT ** IN=eth0 OUT= MAC=00:10:dc:e2:cd:0f:00:18:19:cf:c1:f0:08:00 SRC=78.165.183.249 DST=69.72.214.58 LEN=40 TOS=0x00 PREC=0x00 TTL=53 ID=8621 PROTO=TCP SPT=113 DPT=39128 WINDOW=0 RES=0x00 ACK RST FIN URGP=0 access log is always clean 127.0.0.1 - - [28/Nov/2007:09:45:55 -0500] "GET / HTTP/1.0" 200 2860 127.0.0.1 - - [28/Nov/2007:09:45:56 -0500] "GET / HTTP/1.0" 200 2860 127.0.0.1 - - [28/Nov/2007:09:46:03 -0500] "GET / HTTP/1.0" 200 2860 127.0.0.1 - - [28/Nov/2007:09:46:13 -0500] "GET / HTTP/1.0" 200 2860 127.0.0.1 - - [28/Nov/2007:09:46:19 -0500] "GET / HTTP/1.0" 200 2860 127.0.0.1 - - [28/Nov/2007:09:46:53 -0500] "GET / HTTP/1.0" 200 2860 127.0.0.1 - - [28/Nov/2007:09:47:01 -0500] "GET / HTTP/1.0" 200 2860 127.0.0.1 - - [28/Nov/2007:09:48:08 -0500] "GET / HTTP/1.0" 200 2860 127.0.0.1 - - [28/Nov/2007:09:48:09 -0500] "GET / HTTP/1.0" 200 2860 127.0.0.1 - - [28/Nov/2007:09:48:10 -0500] "GET / HTTP/1.0" 200 2860 127.0.0.1 - - [28/Nov/2007:09:48:11 -0500] "GET / HTTP/1.0" 200 2860 127.0.0.1 - - [28/Nov/2007:09:48:12 -0500] "GET / HTTP/1.0" 200 2860 127.0.0.1 - - [28/Nov/2007:09:48:13 -0500] "GET / HTTP/1.0" 200 2860 127.0.0.1 - - [28/Nov/2007:09:48:14 -0500] "GET / HTTP/1.0" 200 2860 127.0.0.1 - - [28/Nov/2007:09:48:15 -0500] "GET / HTTP/1.0" 200 2860 127.0.0.1 - - [28/Nov/2007:09:48:16 -0500] "GET / HTTP/1.0" 200 2860 127.0.0.1 - - [28/Nov/2007:09:48:17 -0500] "GET / HTTP/1.0" 200 2860 error logs just show bad requests or files that aren't there- I modified the path's as I pasted [Wed Nov 28 09:47:07 2007] [error] [client 82.195.137.125] File does not exist: /path/favicon.ico [Wed Nov 28 09:47:07 2007] [error] [client 82.195.137.125] File does not exist: /path/favicon.ico/404.shtml [Wed Nov 28 09:50:08 2007] [error] [client 74.224.203.239] File does not exist: /path/favicon.ico [Wed Nov 28 09:50:08 2007] [error] [client 74.224.203.239] File does not exist: /path/favicon.ico/404.shtml [Wed Nov 28 09:50:32 2007] [error] [client 74.224.203.239] File does not exist: /path/favicon.ico/favicon.ico [Wed Nov 28 09:50:32 2007] [error] [client 74.224.203.239] File does not exist: /path/favicon.ico/404.shtml [Wed Nov 28 09:53:18 2007] [error] [client 77.102.100.245] File does not exist: /path/favicon.ico/favicon.ico [Wed Nov 28 09:53:18 2007] [error] [client 77.102.100.245] File does not exist: /path/favicon.ico/404.shtml still can't find warn log.. when I locate warn, thousands of results come up..
1. you still never told me what linux you use how in heaven do u expect most time/resource-efficient help if i have to guess what u are using ?? 2. warn log search until you find - if it takes yoou 5 or 500 hrs never mind it IS there - in every reasonable linux install - any dist normal linux dists have most logs in /var/log normally in same folder as messages or in subfolders of above 3. you have to be much more precise with your quoted log entries >>> what EXACT second did apache stop then show for ALL logs the seconds before until after apache stop in your above quoted logs you start each of your quotes in a different minute - hence none of the quoted log lines is of ANY value for analysis until ALL log lines are from SAME period of time starting BEFORE apache stop until AFTER apache restart !! find ALL 4 a.m. logs - even if you have to go manually thu 1000 warn files there is ONE warn log only - and it is for your dist easy to be found if you know what dist you have and if you GOOGLE or read howtos since apache stops - you have to find entries in log files about that apache action ! the chance that apache stops without log entry is near zero
Do you have apf and bfd installed? Is your firewall in demo mode? Are any of the IPs from the rabhit yours?
bfd= yes apf= yes demo mode.. don't think so. I am currently on a mobile device at work and can't check till later.. rabhit- src ips not mine, but dst ips are mine. I can check More thoroughly tonight when I'm home
I have seen these symptoms before... check how many Apache processes are running at the time when it is not responding. There's a setting in your Apache .conf file for max_clients and if you reach that number of apache processes Apache will stop accepting requests. max_clients is usually around 300 - 600. If you had a single user on a dial-up modem who requested 500 files simultaneously, he could cause your entire site to stop working until he finished downloading some of them. If your number of Apache processes is down around a normal level then this isn't your problem but it's worth checking.
Webhostgear runs a server diagnostic and repair service.. I emailed them and they think it is being caused by APF firewall and have offered a price to investigate and repair. I am in the process of gathering all the information on APF I can, and will try to investigate this before asking them to do it.. I am going to start by following the manual http://rfxnetworks.com/appdocs/README.apf line for line Anyone have any other APF hints or resources?
because you are smart.. I looked and it wasn't on test mode. I also changed some variables today. I will have to monitor it and see if it did anything. btw, server is pretty active. burns almost 100 gigs of b/w a month and serves tens of thousands of pages daily. I'm thinking it might have had something to do with the standard flood control number that was preset. I can't remember what it was called, but I adjusted it to 52000 instead of the 37000 (estimate) that comes preset. later I will post the exact details when I get off this crappy mobile device