1. Advertising
    y u no do it?

    Advertising (learn more)

    Advertise virtually anything here, with CPM banner ads, CPM email ads and CPC contextual links. You can target relevant areas of the site and show ads based on geographical location of the user if you wish.

    Starts at just $1 per CPM or $0.10 per CPC.

server hanging from too many httpd processes

Discussion in 'Apache' started by classifieds, Mar 24, 2005.

  1. #1
    Every few days my server will spawn 450 or so httpd daemons in a few minutes and effectively go offline. It requires a shutdown -r to resolve and that usually takes 45 minutes to complete.

    I'm trying to track down what's causing it. I suspect that either I've got a misbehaving bot or an obnoxious email address harvester hitting it.

    I've looked at the logfiles via webalyzer but don't see anything obvious and the message log does not have any entries that look suspicious.

    Any advice on how to figure out what’s causing it?

    Are there apache configuration parameters that will help?

    Any suggestions would be appreciated.

    Regards,

    -jay
     
    classifieds, Mar 24, 2005 IP
  2. J.D.

    J.D. Peon

    Messages:
    1,198
    Likes Received:
    65
    Best Answers:
    0
    Trophy Points:
    0
    #2
    This kind of condition may also be triggered by a bug in the code that is being executed. For example, if you have an endless loop that will tie up one of the worker threads, eventually all of them will end up hanging, locking up the server. Try to see if the amount of traffic a few minutes before the server hangs is increasing compared to the time it's working all right. Check CPU usage - if you have an endless loop that does something like string comparison, your CPUs will go to 100%.

    J.D.
     
    J.D., Mar 24, 2005 IP
  3. nullbit

    nullbit Peon

    Messages:
    489
    Likes Received:
    19
    Best Answers:
    0
    Trophy Points:
    0
    #3
    Next time it happens it would be quicker to reset the httpd daemon, instead of doing a full reset, how to do this depends on your distro, redhat/fedora which are the most common, would be:
    service httpd restart
    Code (markup):
    When it crashes check the end of your and access and error, to see what was hitting the server before it went down:
    Access logs:
    
    tail /var/log/httpd/access_log -n 30
    
    Code (markup):
    Error logs:
    
    tail /var/log/httpd/error_log -n 30
    
    Code (markup):
    You might need to change the log paths to reflect your directory structure.

    If you're logged while it happens you can do this to see what clients are hitting your box:
    
    netstat -tpu
    
    Code (markup):
    If it's one host causing the problem you can block it with your firewall:
    
    iptables -I INPUT 1 -s xxx.xxx.xxx.xxx -j DROP
    
    Code (markup):
    xxx.xxx.xxx.xxx is the ip to block.

    This will also block a host on most systems:
    
    echo xxx.xxx.xxx.xxx >> /etc/hosts.deny
    
    Code (markup):
    Otherwise you will probably have to make some changes to the apache config file to limit the number of allowed processes.
     
    nullbit, Mar 24, 2005 IP
  4. digitalpoint

    digitalpoint Overlord of no one Staff

    Messages:
    38,333
    Likes Received:
    2,613
    Best Answers:
    462
    Trophy Points:
    710
    Digital Goods:
    29
    #4
    You should lower the allowed number of clients within your httpd.conf file.

    For example, this would allow a maximum of 50 httpd processes to be spawned:

    MaxClients 50
    Code (markup):
    Whatever it's set to now, it seems your server can't handle that number, so I would lower it. Even a very high traffic site usually would be fine at 100. I run digitalpoint.com with it set to 100 and never had an issue.
     
    digitalpoint, Mar 24, 2005 IP
  5. classifieds

    classifieds Sopchoppy Flash

    Messages:
    825
    Likes Received:
    51
    Best Answers:
    0
    Trophy Points:
    150
    #5
    Thanks for the suggestions.

    I'll have a window later tonight to make these changes and dig into exactly what's going.

    Regards,

    -jay
     
    classifieds, Mar 24, 2005 IP
  6. classifieds

    classifieds Sopchoppy Flash

    Messages:
    825
    Likes Received:
    51
    Best Answers:
    0
    Trophy Points:
    150
    #6
    Thanks for the help :)

    Here's what I discovered (and not).

    1. There was a lot of "files not found errors" in the error log - *fixed*

    2. My .htaccess "deny" is working well for the Nigerian 419 scammers :) I'm looking at the host.deny and other IP level blocks to improve efficiency - my .htaccess is at 16k (even with using CIDR for the addresses).

    3. There was no indication of sudden traffic from a single IP address or range of addresses - So at this point I'm going to assume that there's some buggy code somewhere. I've set up some traps to try to isolate it.

    4. The MaxClient was set at 400, I changed it to 100 and will lower it further if the problem shows up again.

    5. Restarting the http daemon took 2 minutes instead of 45 minutes for a full reboot :)

    Thanks again for the advice!

    Regards,

    -jay
     
    classifieds, Mar 25, 2005 IP
  7. J.D.

    J.D. Peon

    Messages:
    1,198
    Likes Received:
    65
    Best Answers:
    0
    Trophy Points:
    0
    #7
    Did you say 45 *minutes*?!
     
    J.D., Mar 25, 2005 IP
  8. classifieds

    classifieds Sopchoppy Flash

    Messages:
    825
    Likes Received:
    51
    Best Answers:
    0
    Trophy Points:
    150
    #8
    YES I DID :eek:

    It was very frustrating.

    -jay
     
    classifieds, Mar 25, 2005 IP
  9. nullbit

    nullbit Peon

    Messages:
    489
    Likes Received:
    19
    Best Answers:
    0
    Trophy Points:
    0
    #9
    45 minutes is an extremely long reboot time. You probably need to look at your init scripts, and check your logs, somethings causing that lapse.
     
    nullbit, Mar 25, 2005 IP
  10. J.D.

    J.D. Peon

    Messages:
    1,198
    Likes Received:
    65
    Best Answers:
    0
    Trophy Points:
    0
    #10
    If a restart/reboot takes longer than a few minutes, I usually kill the process that causes it.

    J.D.
     
    J.D., Mar 25, 2005 IP
  11. classifieds

    classifieds Sopchoppy Flash

    Messages:
    825
    Likes Received:
    51
    Best Answers:
    0
    Trophy Points:
    150
    #11
    A normal reboot takes about 4-5 minutes.

    The 45 minute reboot happens when 300-400 httpd daemons are spawned in several minutes and overloads the server.

    I'm still trying to determine the cause and I'm hoping that the recommendations made earlier will help mitigate the impact on the server (at least until I figure out what's causing it).

    -jay
     
    classifieds, Mar 25, 2005 IP
  12. nullbit

    nullbit Peon

    Messages:
    489
    Likes Received:
    19
    Best Answers:
    0
    Trophy Points:
    0
    #12
    OK, 4-5 minutes is OK for a normal reboot.

    Does the large spawning happen at a particular time of day, or is it totally random?
     
    nullbit, Mar 25, 2005 IP
  13. J.D.

    J.D. Peon

    Messages:
    1,198
    Likes Received:
    65
    Best Answers:
    0
    Trophy Points:
    0
    #13
    I understand. What I'm saying is that I cannot imagine a machine being out of circulation for over 4-5 minutes and when this kind of thing happens, I usually kill the process that causes the problem after a short timeout (typically a minute or so). The only thing to watch out here for is that if it's not httpd, but something that else (e.g. DBMS), then killing the process may have repercussions on integrity of the data it's the killed process was working on at the time it was killed.

    J.D.
     
    J.D., Mar 25, 2005 IP
  14. classifieds

    classifieds Sopchoppy Flash

    Messages:
    825
    Likes Received:
    51
    Best Answers:
    0
    Trophy Points:
    150
    #14
    When it gets in this vegetative state the response time on the SSH/telnet session is so slow that it takes 5 minutes to enter one command and at this point its spawned so many processes that its difficult and slow to dig through them looking for the culprit.

    As you can tell from my posts I’m not a sys admin nor am I a programmer – (not since the early eighties anyway –I loved those old Sperry Univac DCPs!).

    I appreciate your experience, insights and recommendations so please keep them coming!

    -jay
     
    classifieds, Mar 25, 2005 IP
  15. nullbit

    nullbit Peon

    Messages:
    489
    Likes Received:
    19
    Best Answers:
    0
    Trophy Points:
    0
    #15
    The top program will print out CPU/Memory/etc usage for the most active processes
     
    nullbit, Mar 25, 2005 IP
  16. J.D.

    J.D. Peon

    Messages:
    1,198
    Likes Received:
    65
    Best Answers:
    0
    Trophy Points:
    0
    #16
    That means your CPU is pegged at 100%. I think it is a bad loop somewhere in the code. May be not your code, but something is spinning its wheels when this happens.

    J.D.
     
    J.D., Mar 25, 2005 IP
    classifieds likes this.
  17. classifieds

    classifieds Sopchoppy Flash

    Messages:
    825
    Likes Received:
    51
    Best Answers:
    0
    Trophy Points:
    150
    #17
    I'm running Linux Fedora Core 2.

    Is it "top" - no arguments?
     
    classifieds, Mar 25, 2005 IP
  18. nullbit

    nullbit Peon

    Messages:
    489
    Likes Received:
    19
    Best Answers:
    0
    Trophy Points:
    0
    #18
    Yes just top. It can take arguments to customize the output (which might be useful in your case), do "man top" for more info.
     
    nullbit, Mar 25, 2005 IP
    classifieds likes this.
  19. classifieds

    classifieds Sopchoppy Flash

    Messages:
    825
    Likes Received:
    51
    Best Answers:
    0
    Trophy Points:
    150
    #19
    This and the other suggestions should give me plenty to do this weekend!

    Thanks again for the help.

    -jay
     
    classifieds, Mar 25, 2005 IP
  20. J.D.

    J.D. Peon

    Messages:
    1,198
    Likes Received:
    65
    Best Answers:
    0
    Trophy Points:
    0
    #20
    You can record top's output every 60 seconds or so, in case if you want to leave it running for some time:

    top -d 60 -b > top.txt

    Each output will be 5-10K, so watch your drive usage if you change the timeout or want to leave it running for a while.

    J.D.
     
    J.D., Mar 25, 2005 IP