Server stops responding for minutes, but pings fine $$ reward

floodrod Well-Known Member

Messages:: 829

Likes Received:: 26

Best Answers:: 0

Trophy Points:: 135

#1

If anyone can lead me to solving this, I will paypal $15 to the person who can help me kill this problem.. I can not give out root details, but I can follow directions and commands..

A few times daily, my server stops responding.. Pages don't load on ALL domains hosted on my dedicated server..

I wait a few minutes and it's back up... During the down time, the server pings fine...

I contacted my dedicated server company several times and they keep trying things that don't work, and they say it's my ISP. It's not my ISP, as I can surf fine on all other sites when it happens..

Server Swap done by the host
Memory Tested fine (they say)

Apache logs show nothing wrong, messages show nothing wrong, and other logs are clean..

If someone is willing, I can follow instructions, and send them info such as php info, and whatnot..

edit.. forgot to mention, its a linux dedicated Cpanel bot

Thanks

floodrod, Nov 26, 2007 IP

InFloW Peon

Messages:: 1,488

Likes Received:: 39

Best Answers:: 0

Trophy Points:: 0

#2

Sounds like a load spike due to maybe a cronjob running? I'd try being in SSH when it happens and being in top and see what sort of load average it's at when this happens.

InFloW, Nov 27, 2007 IP

hans Well-Known Member

Messages:: 2,923

Likes Received:: 126

Best Answers:: 1

Trophy Points:: 173

#3

login directly using SSH !!

open n5( five ) ssh connection on your bash/shell to your server

then run on 4 of the connections on your remote server the following - ONE line on each shell/connection:

tail -f /var/log/warn
tail -f /var/log/messages
tail -f /var/log/apache2/error_log
tail -f /var/log/apache2/access_log

then on your 5th connection - that's your "working" connection to enter bash commands.
then first check your apache config - u did NOT mention which apache version - 1.3, 2.0 or 2.2

to check apache config - typcially for 2.2. on a suse 10.1 linux it would be:

rcapache2 configtest
if reply OK
then run next:
rcapache2 extreme-configtest

if any of above NOT OK - then follow instructions and clean up your apache config

if OK
then do

rcapache2 reload
that is a soft-restart of apache and should give you in the

tail -f /var/log/apache2/error_log

a line similar to:
[Tue Nov 27 23:30:03 2007] [notice] Graceful restart requested, doing restart
[Tue Nov 27 23:30:07 2007] [notice] Apache configured -- resuming normal operations

as you then can see the "downtime" during a soft restart of apache is in the range of 4 seconds.

if your apache does a correct / timely reload
then do a full restart as follows

rcapache2 restart

look again at your

tail -f /var/log/apache2/error_log

the downtime is a few seconds longer

if any of above 2 reload/restart commands create a delayed restart of your apache as you experience it in your normal operations - then look at the details of the 3 OTHER tail -f commands / widows - check for the seconds during and after your tests to get any possible feedback on possible problems

I had once a non-starting apache a year ago - the result was a
ln - s
from the normal access_log file to an additional working copy for life traffic monitoring by a monitoring SW via browser. the "ln -s" link caused apache to stay down after the daily access_log rotation.

think about WHAT SW or changes you installed / made just before this problem started to occur

another check you can do is to compare the above 4 log files at the exact time of your reoccurring problems and look at the warn / messages or apache errors you find during the downtimes/problems you have!!

if above give you no clue and no help to your problem, then you may have to wait for someone HERE in DP with more experience in that kind of problem to help you further.

pls note that your apache reload/restart does NOT affect your SSH connection !
if however you would login to cpanel or similar - then your browser oriented connection might get lost = bad situation !!

hans, Nov 27, 2007 IP

floodrod Well-Known Member

Messages:: 829

Likes Received:: 26

Best Answers:: 0

Trophy Points:: 135

#4

Thanks for the outline

Apache 1.3.39

tail -f /var/log/warn
No such file in that directory

tail -f /var/log/messages
Got It

tail -f /var/log/apache2/error_log
Only path to apache error log is /usr/apache/logs/error_log

tail -f /var/log/apache2/access_log
Only path to apache error log is /usr/local/apache/logs/access_log

rcapache2 configtest
I can only use /usr/local/apache/bin/apachectl configtest

rcapache2 extreme-configtest
No extreme configtest available

rcapache2 reload
no reload option available

thats pretty much where I stand.. Any advice?

hans said: ↑

tail -f /var/log/warn
tail -f /var/log/messages
tail -f /var/log/apache2/error_log
tail -f /var/log/apache2/access_log

then on your 5th connection - that's your "working" connection to enter bash commands.
then first check your apache config - u did NOT mention which apache version - 1.3, 2.0 or 2.2

to check apache config - typcially for 2.2. on a suse 10.1 linux it would be:

rcapache2 configtest
if reply OK
then run next:
rcapache2 extreme-configtest

if any of above NOT OK - then follow instructions and clean up your apache config

if OK
then do

rcapache2 reload
that is a soft-restart of apache and should give you in the

tail -f /var/log/apache2/error_log

a line similar to:
[Tue Nov 27 23:30:03 2007] [notice] Graceful restart requested, doing restart
[Tue Nov 27 23:30:07 2007] [notice] Apache configured -- resuming normal operations

as you then can see the "downtime" during a soft restart of apache is in the range of 4 seconds.

if your apache does a correct / timely reload
then do a full restart as follows

rcapache2 restart

look again at your

tail -f /var/log/apache2/error_log

the downtime is a few seconds longer

if any of above 2 reload/restart commands create a delayed restart of your apache as you experience it in your normal operations - then look at the details of the 3 OTHER tail -f commands / widows - check for the seconds during and after your tests to get any possible feedback on possible problems

I had once a non-starting apache a year ago - the result was a
ln - s
from the normal access_log file to an additional working copy for life traffic monitoring by a monitoring SW via browser. the "ln -s" link caused apache to stay down after the daily access_log rotation.

think about WHAT SW or changes you installed / made just before this problem started to occur

another check you can do is to compare the above 4 log files at the exact time of your reoccurring problems and look at the warn / messages or apache errors you find during the downtimes/problems you have!!

if above give you no clue and no help to your problem, then you may have to wait for someone HERE in DP with more experience in that kind of problem to help you further.

pls note that your apache reload/restart does NOT affect your SSH connection !
if however you would login to cpanel or similar - then your browser oriented connection might get lost = bad situation !!
Click to expand...

floodrod, Nov 27, 2007 IP

floodrod Well-Known Member

Messages:: 829

Likes Received:: 26

Best Answers:: 0

Trophy Points:: 135

#5

I'm upgrading to apache 2.2 with php 5 now.. We will see how it goes

floodrod, Nov 27, 2007 IP

hans Well-Known Member

Messages:: 2,923

Likes Received:: 126

Best Answers:: 1

Trophy Points:: 173

#6

what to do ?
>> common sense!
since you gave minimal system info = search until you find
example:
/var/log/warn

every descent linux system has that warn-logfile
find it and adapt path in your tail - command line
like your
Only path to apache error log is /usr/apache/logs/error_log
etc !!

same for OTHER missing commands - it you miss a particular command a.m. then research for your own equivalent - > use Google to get instant replies.
if apache 2.2. installed - then most likely you may have similar or equal tools as a.m. at your disposal.

so what's the output of
/usr/local/apache/bin/apachectl configtest
??
and of your

/usr/local/apache/bin/apachectl restart

??

since u update from apache 1.3 to 2.2
you may have a few minor problems specially may be using mod_rewrite
we had ( me too ) such problems before and you find HERE in DP forum, topic apache (mod_rewrite) solutions for upgrade 1.3>2.x

hans, Nov 27, 2007 IP

floodrod Well-Known Member

Messages:: 829

Likes Received:: 26

Best Answers:: 0

Trophy Points:: 135

#7

config test results in

syntax OK

Apache restart

[Tue Nov 27 16:38:55 2007] [notice] SIGHUP received. Attempting to restart
[Tue Nov 27 16:38:56 2007] [notice] Apache/2.2.6 (Unix) mod_ssl/2.2.6 OpenSSL/0. 9.7a mod_auth_passthrough/2.1 FrontPage/5.0.2.2635 mod_bwlimited/1.4 configured -- resuming normal operations

took a few seconds. no hangtime

hans said: ↑

what to do ?
>> common sense!
since you gave minimal system info = search until you find
example:
/var/log/warn

every descent linux system has that warn-logfile
find it and adapt path in your tail - command line
like your
Only path to apache error log is /usr/apache/logs/error_log
etc !!

same for OTHER missing commands - it you miss a particular command a.m. then research for your own equivalent - > use Google to get instant replies.
if apache 2.2. installed - then most likely you may have similar or equal tools as a.m. at your disposal.

so what's the output of
/usr/local/apache/bin/apachectl configtest
??
and of your

/usr/local/apache/bin/apachectl restart

??

since u update from apache 1.3 to 2.2
you may have a few minor problems specially may be using mod_rewrite
we had ( me too ) such problems before and you find HERE in DP forum, topic apache (mod_rewrite) solutions for upgrade 1.3>2.x
Click to expand...

floodrod, Nov 27, 2007 IP

hans Well-Known Member

Messages:: 2,923

Likes Received:: 126

Best Answers:: 1

Trophy Points:: 173

#8

1.
i see u have upgraded to apache 2.2.
fine

2.
now observe and see if you still get your apache NOT restarting properly during certain daily situations
best is to observe life using the tail -f commands a.m.

3.
as said above
use the 4 log files - specially the apache error_log
look at the precise times of your downs - then see what errors occurred during previous down times exactly the seconds around your previous downtimes

you may download your log files - all a.m. into your laptop for offline processing - for the periods you HAD known downtimes.

4.
also as said b4
research when the first such occurrence appeared
what did u change in your system ( config OR any link, symlink, new tool, new script installed, new site, etc ) immediately prior to the first occurrence

as a.m.
it happened to me once about a year ago, unfortunately i don't recall the exact details but something to the extend of writing a full access_log live into user-space of a domain account for real time processing of traffic by user
that kept my apache2.2 down until manual restart

hans, Nov 27, 2007 IP

floodrod Well-Known Member

Messages:: 829

Likes Received:: 26

Best Answers:: 0

Trophy Points:: 135

#9

okie dokie... Thanks for the help hans, I will keep you posted..

It's hard to catch it when it stops responding as it happens a few times a day at random times..

floodrod, Nov 27, 2007 IP

hans Well-Known Member

Messages:: 2,923

Likes Received:: 126

Best Answers:: 1

Trophy Points:: 173

#10

even if you don't catch it life when it happens
by looking at PREVIOUS incidents and searching ALL a.m. log files at the minutes / seconds before it happened, then you almost surely find the action triggering the down of apache

but may be now with apache2 it's all OK who knows

if you find the exact incident triggering your apache down time - also check crontab if there is a scheduled task among other factors that might trigger the problem.

you HAVE to find the problem - unless its solved by upgrading now. else you risk downtimes=loss of $ for all sites.

however

since you upgraded from 1.3 to 2.2 - make sure ALL your previous apache features are fully working since there were a few changes in default configuration between 1.3 and 1.2.

not sure if u have OTHER sites on your server as well or only your own site(s) - but test the mod_rewrite function in your new apache. it either works or does NOT at all. in my case latter after same upgrade - until i changed the apache config.

and now as a final "homework" since you have another dist still unknown to us

check all your apache commands to make sure you know your own functionalities

http://httpd.apache.org/docs/2.2/programs/apachectl.html

u see that my suse
rcapache2 reload
may find its equivalent in your

apachectl graceful

find out ALL your paths and names of the 4 ( four ) a.m. log files
and know all apache commands of your current version/distribution.

remember each time you change your apache config - you have to relaod that new config by using your

apachectl graceful

and BEFORE reloading a possibly faulty modified apache config always run your

apachectl configtest
if OK then do your graceful or restart - else correct first.
changes in .htaccess are instantly active - changes in apache main config files usually only after a reload of config or restart of server.

for future questions it would be much more helpful to give FULL details of your linux dist and main setup versions - even a regular link to your site in question may help to help you more directly and more efficiently

hans, Nov 27, 2007 IP

floodrod Well-Known Member

Messages:: 829

Likes Received:: 26

Best Answers:: 0

Trophy Points:: 135

#11

OK, it seems to just have happened..

messages are always filled with this, besides an occasional failed authentication try hy hack attempts..

Nov 28 09:10:59 jimbob kernel: ** RABHIT ** IN=eth0 OUT= MAC=00:10:dc:e2:cd:0f:00:18:19:cf:c1:f0:08:00 SRC=78.149.199.223 DST=69.72.214.58 LEN=40 TOS=0x00 PREC=0x00 TTL=53 ID=6071 PROTO=TCP SPT=113 DPT=59897 WINDOW=0 RES=0x00 ACK RST FIN URGP=0
Nov 28 09:25:42 jimbob kernel: ** RABHIT ** IN=eth0 OUT= MAC=00:10:dc:e2:cd:0f:00:18:19:cf:c1:f0:08:00 SRC=190.198.248.23 DST=69.72.214.58 LEN=40 TOS=0x00 PREC=0x00 TTL=56 ID=6016 PROTO=TCP SPT=113 DPT=41871 WINDOW=0 RES=0x00 ACK RST FIN URGP=0
Nov 28 09:41:35 jimbob kernel: ** RABHIT ** IN=eth0 OUT= MAC=00:10:dc:e2:cd:0f:00:18:19:cf:c1:f0:08:00 SRC=78.144.142.137 DST=69.72.214.58 LEN=40 TOS=0x00 PREC=0x00 TTL=52 ID=15585 PROTO=TCP SPT=113 DPT=53868 WINDOW=0 RES=0x00 ACK RST FIN URGP=0
Nov 28 09:44:10 jimbob kernel: ** RABHIT ** IN=eth0 OUT= MAC=00:10:dc:e2:cd:0f:00:18:19:cf:c1:f0:08:00 SRC=201.9.15.152 DST=69.72.214.58 LEN=40 TOS=0x00 PREC=0x00 TTL=56 ID=6267 PROTO=TCP SPT=113 DPT=33936 WINDOW=0 RES=0x00 ACK RST FIN URGP=0
Nov 28 09:44:29 jimbob kernel: ** RABHIT ** IN=eth0 OUT= MAC=00:10:dc:e2:cd:0f:00:18:19:cf:c1:f0:08:00 SRC=78.165.183.249 DST=69.72.214.58 LEN=40 TOS=0x00 PREC=0x00 TTL=53 ID=8621 PROTO=TCP SPT=113 DPT=39128 WINDOW=0 RES=0x00 ACK RST FIN URGP=0

access log is always clean

127.0.0.1 - - [28/Nov/2007:09:45:55 -0500] "GET / HTTP/1.0" 200 2860
127.0.0.1 - - [28/Nov/2007:09:45:56 -0500] "GET / HTTP/1.0" 200 2860
127.0.0.1 - - [28/Nov/2007:09:46:03 -0500] "GET / HTTP/1.0" 200 2860
127.0.0.1 - - [28/Nov/2007:09:46:13 -0500] "GET / HTTP/1.0" 200 2860
127.0.0.1 - - [28/Nov/2007:09:46:19 -0500] "GET / HTTP/1.0" 200 2860
127.0.0.1 - - [28/Nov/2007:09:46:53 -0500] "GET / HTTP/1.0" 200 2860
127.0.0.1 - - [28/Nov/2007:09:47:01 -0500] "GET / HTTP/1.0" 200 2860
127.0.0.1 - - [28/Nov/2007:09:48:08 -0500] "GET / HTTP/1.0" 200 2860
127.0.0.1 - - [28/Nov/2007:09:48:09 -0500] "GET / HTTP/1.0" 200 2860
127.0.0.1 - - [28/Nov/2007:09:48:10 -0500] "GET / HTTP/1.0" 200 2860
127.0.0.1 - - [28/Nov/2007:09:48:11 -0500] "GET / HTTP/1.0" 200 2860
127.0.0.1 - - [28/Nov/2007:09:48:12 -0500] "GET / HTTP/1.0" 200 2860
127.0.0.1 - - [28/Nov/2007:09:48:13 -0500] "GET / HTTP/1.0" 200 2860
127.0.0.1 - - [28/Nov/2007:09:48:14 -0500] "GET / HTTP/1.0" 200 2860
127.0.0.1 - - [28/Nov/2007:09:48:15 -0500] "GET / HTTP/1.0" 200 2860
127.0.0.1 - - [28/Nov/2007:09:48:16 -0500] "GET / HTTP/1.0" 200 2860
127.0.0.1 - - [28/Nov/2007:09:48:17 -0500] "GET / HTTP/1.0" 200 2860

error logs just show bad requests or files that aren't there- I modified the path's as I pasted

[Wed Nov 28 09:47:07 2007] [error] [client 82.195.137.125] File does not exist: /path/favicon.ico
[Wed Nov 28 09:47:07 2007] [error] [client 82.195.137.125] File does not exist: /path/favicon.ico/404.shtml
[Wed Nov 28 09:50:08 2007] [error] [client 74.224.203.239] File does not exist: /path/favicon.ico
[Wed Nov 28 09:50:08 2007] [error] [client 74.224.203.239] File does not exist: /path/favicon.ico/404.shtml
[Wed Nov 28 09:50:32 2007] [error] [client 74.224.203.239] File does not exist: /path/favicon.ico/favicon.ico
[Wed Nov 28 09:50:32 2007] [error] [client 74.224.203.239] File does not exist: /path/favicon.ico/404.shtml
[Wed Nov 28 09:53:18 2007] [error] [client 77.102.100.245] File does not exist: /path/favicon.ico/favicon.ico
[Wed Nov 28 09:53:18 2007] [error] [client 77.102.100.245] File does not exist: /path/favicon.ico/404.shtml

still can't find warn log.. when I locate warn, thousands of results come up..

floodrod, Nov 28, 2007 IP

hans Well-Known Member

Messages:: 2,923

Likes Received:: 126

Best Answers:: 1

Trophy Points:: 173

#12

1.
you still never told me what linux you use
how in heaven do u expect most time/resource-efficient help if i have to guess what u are using ??

2.
warn log
search until you find - if it takes yoou 5 or 500 hrs never mind it IS there - in every reasonable linux install - any dist
normal linux dists have most logs in
/var/log
normally in same folder as messages
or in subfolders of above

3.
you have to be much more precise with your quoted log entries

>>> what EXACT second did apache stop

then show for ALL logs the seconds before until after apache stop

in your above quoted logs you start each of your quotes in a different minute - hence none of the quoted log lines is of ANY value for analysis until ALL log lines are from SAME period of time starting BEFORE apache stop until AFTER apache restart !!

find ALL 4 a.m. logs - even if you have to go manually thu 1000 warn files
there is ONE warn log only - and it is for your dist easy to be found if you know what dist you have and if you GOOGLE or read howtos

since apache stops - you have to find entries in log files about that apache action ! the chance that apache stops without log entry is near zero

hans, Nov 28, 2007 IP

ray9 Guest

Messages:: 69

Likes Received:: 2

Best Answers:: 0

Trophy Points:: 0

#13

Do you have apf and bfd installed? Is your firewall in demo mode?
Are any of the IPs from the rabhit yours?

ray9, Nov 28, 2007 IP

floodrod Well-Known Member

Messages:: 829

Likes Received:: 26

Best Answers:: 0

Trophy Points:: 135

#14

bfd= yes
apf= yes
demo mode.. don't think so. I am currently on a mobile device at work and can't check till later..

rabhit- src ips not mine, but dst ips are mine.

I can check More thoroughly tonight when I'm home

ray9 said: ↑

Do you have apf and bfd installed? Is your firewall in demo mode?
Are any of the IPs from the rabhit yours?
Click to expand...

floodrod, Nov 29, 2007 IP

Ladadadada Peon

Messages:: 382

Likes Received:: 36

Best Answers:: 0

Trophy Points:: 0

#15

I have seen these symptoms before... check how many Apache processes are running at the time when it is not responding. There's a setting in your Apache .conf file for max_clients and if you reach that number of apache processes Apache will stop accepting requests.

max_clients is usually around 300 - 600. If you had a single user on a dial-up modem who requested 500 files simultaneously, he could cause your entire site to stop working until he finished downloading some of them.

If your number of Apache processes is down around a normal level then this isn't your problem but it's worth checking.

Ladadadada, Nov 29, 2007 IP

ray9 Guest

Messages:: 69

Likes Received:: 2

Best Answers:: 0

Trophy Points:: 0

#16

floodrod, any news?

ray9, Nov 30, 2007 IP

floodrod Well-Known Member

Messages:: 829

Likes Received:: 26

Best Answers:: 0

Trophy Points:: 135

#17

Webhostgear runs a server diagnostic and repair service.. I emailed them and they think it is being caused by APF firewall and have offered a price to investigate and repair.

I am in the process of gathering all the information on APF I can, and will try to investigate this before asking them to do it..

I am going to start by following the manual http://rfxnetworks.com/appdocs/README.apf line for line
Anyone have any other APF hints or resources?

floodrod, Dec 3, 2007 IP

ray9 Guest

Messages:: 69

Likes Received:: 2

Best Answers:: 0

Trophy Points:: 0

#18

why do you think I asked for apf right away?

ray9, Dec 4, 2007 IP

floodrod Well-Known Member

Messages:: 829

Likes Received:: 26

Best Answers:: 0

Trophy Points:: 135

#19

ray9 said: ↑

why do you think I asked for apf right away?
Click to expand...

because you are smart..

I looked and it wasn't on test mode. I also changed some variables today. I will have to monitor it and see if it did anything.

btw, server is pretty active. burns almost 100 gigs of b/w a month and serves tens of thousands of pages daily. I'm thinking it might have had something to do with the standard flood control number that was preset. I can't remember what it was called, but I adjusted it to 52000 instead of the 37000 (estimate) that comes preset.

later I will post the exact details when I get off this crappy mobile device

floodrod, Dec 4, 2007 IP

Log in or Sign up

Server stops responding for minutes, but pings fine $$ reward

floodrod Well-Known Member

InFloW Peon

hans Well-Known Member

floodrod Well-Known Member

floodrod Well-Known Member

hans Well-Known Member

floodrod Well-Known Member

hans Well-Known Member

floodrod Well-Known Member

hans Well-Known Member

floodrod Well-Known Member

hans Well-Known Member

ray9 Guest

floodrod Well-Known Member

Ladadadada Peon

ray9 Guest

floodrod Well-Known Member

ray9 Guest

floodrod Well-Known Member

Useful Searches