Skip navigation.
Sponsors
Recent comments
The price of sucess - too many users!

The price of sucess - too many users!

25
Oct
2007

Recently at work we had been having issues with spiking server load. One of the potential suspects was the Apache configuration as it was allowing 256 MaxClients. Combine that with Drupal eating RAM for breakfast (say a minimum of 12Mb per page) and you have a recipe for disaster - too many visitors cause a RAM shortage, lots of swapping and eventually a server meltdown. After speaking the Rackspace Technical Support Team, one of the guys there (Daniel) wrote a VERY useful script for us to run on the server to monitor Apache usage.

The basic principle is to regularly use the ps command to get a process list, then to filter is using grep, then filter using grep again to remove the line from the process list which was filtering for processes ('grep httpd' contains the phrase httpd so it gets included in the initial filter). Finally use the wc command to do a count of the result. This theory produces the following line of code which you can run on your system…

ps aux | grep http | grep -v "\(root\|grep\)" | wc -l

Here is an explanation…

ps aux

Get a process list. The 'a' causes a full process list for the current terminal. The 'u' causes it to be user-oriented. The 'x' causes it to be for the current user only.

grep http

This goes through the list produced and reduces it to lines containing only the word 'http'.

grep -v "\(root\|grep\)"

Unfortunately for the previous process, it also contains one of the words it's filtering for - this means it appears as one of the processes. Root also owns one of the Apache processes (the parent one maybe?). We want to filter out these two so we use grep as we did before but filter out for the lines containing grep and root. The '-v' option tells grep to make this an inverse-filter.

wc -l

This is a really simple and yet useful function. It's a word counter and the '-l' (thats a lowercase L) option tells it to count newlines which is what separates our process in the list!

By running the above command you will get a number returned on the terminal. This tells you how many apache processes are currently running on your system. A slight variation on the initial 'ps' command will give you some pretty usefull information...

ps axo 'pid user size cmd' | grep http | grep -v "\(root\|grep\)"

This version will very nicely list you out a table of running apache processes (not caused by root or grep) with only 4 columns - Process ID, Username (of the process owner), Size (in Kb - I THINK) and the Command that was run. This means you can quickly see how much actual RAM your webserver is using for apache!

Finally - you want to be able to know how many processes are running over the periof of a day, etc... This is where Rackspace turned those commands into a VERY nice bash script!

#!/bin/bash

THRESHOLD=100
ADDRTO="admin@mysite.com"
SUBJECT="Apache Process Check"
LOCKFILE="/tmp/apache_process_check.lock"
LOGFILE="/var/log/apache_processes.log"

NUMHTTPD=`ps aux | grep http | grep -v "\(root\|grep\)" | wc -l`
echo "`date +'%Y-%m-%d %H:%M:%S %Z'` - ${NUMHTTPD}" >> ${LOGFILE}

if [[ ${NUMHTTPD} -gt ${THRESHOLD} ]]; then
if [ ! -e "${LOCKFILE}" ]; then
echo "The number of currently running httpd threads is ${NUMHTTPD}." | mail -s "${SUBJECT} - Above Threshold" ${ADDRTO}
touch ${LOCKFILE}
fi
else
if [ -e "${LOCKFILE}" ]; then
rm -f "${LOCKFILE}"
echo "The number of currently running httpd threads is ${NUMHTTPD}." | mail -s "${SUBJECT} - Below Threshold" ${ADDRTO}
fi
fi

This, quite simply, will log the apache process. If the threshold count is breached (in this case, 100) then it will create a lock file and email the address specified letting the admin know that they're quite close to their limit. The lock file gets deleted when the process count drops below the threshold again. This semaphore stops the script spamming the admin when the server is under load. It also pipes a timestamp and the result of the process count into a logfile (on a new line). This log could easily be imported into something like Excel if you want to produce pretty graphs. If you're slightly more challenge-oriented then maybe you could write a script to parse it into RRDTool!

The next step is to make that script executable and then setup crontab to run it regularly (say, every 5 minutes).

I'd like to thank Rackspace for their "fanatical support" - especially Daniel for going above and beyond the call of duty. This is a REALLY handy script!

Thanks for the script mate

Thanks for the script mate (Daniel) and massive thanks to Nick for helping me in getting this setup on my vps.

Cheers and keep up the good work.

Monit

Interesting hack. An alternative way, possibly somewhat simpler, would be to use Monit, which is designed for this kind of things.

Monit looks good

That looks like a nice application - however it's rather bloated if you just wanna quickly make a note of roughly how many threads Apache is spawning.

Thanks for the link to it though! Always useful to have those kind of tools in your arsenal just in case :-)

Tuning MaxClients

A while back, I wrote an article on how MaxClients is often mis-configured (set to high), and what you can set it to. See Tuning the Apache MaxClients parameter.

Also, if you use a tool like Munin, it already has the number of Apache processes in use in a nice graph.

munin even better

MUNIN would be a very nice alternative - apt-get install munin

or nagios.

generally spoken, if you do not know anything about the most important work of an admin - performance monitoring and optimizing - you should NOT try to be root - leave it to people that know something about it and concentrate on the things that YOU are good at.

this will also be good for security.

Unfortunately nowadays every little boy and his sister think they need a server - eating up way too much energy. People, please start thinking it over, do you really NEED your own server?!? In most cases the answer will be "NO" - just talk to your provider if you need some special environment, this will be no problem. But your own box will not only kill the planet, but also cost you a lot of time. Virtual servers are a good solution if you really need to be root for some reason, but nobody needs to host own hardware in a gigantic datacenter that eats up energy like a small city and 99% of the machines are idling... crazy pollution!!!

Nice article!

Thanks for the link to that article - very interesting.

I might look into Munin - seems to have a lot of high praise surrounding it!

The way grep works...

Is such that you can use this line instead of the additional pipe to a second grep.

ps aux | grep [h]ttp | wc -l

Huh?

Matt - how does making the h into a Character Class help?

I've tried it and it does indeed remove the entry produced by grep - but it doesn't remove the parent "root" thread which I THINK is the one which controls all the apache child threads.

I dont get why the pattern '[h]ttp' doesn't match 'grep http' but it does match '/usr/sbin/httpd'.

it doesn't match becouse..

ps in its output prints something like grep [h]ttpd which is not what we want to grep, it simply doesn't match.

It'd be nice if you asked before taking stuff from my site. Contact me at webmaster [at] thingy - ma - jig . co . uk

This site was based on the Cobalt 2.0 Theme for phpBB written by Jakob Persson

Search
Weblinks

Add to Technorati Favorites

TGC Webring

CMS Drupal Showcase

Feedburner for ThingyMaJig

View Nicholas Thompson's profile on LinkedIn

IconBuffet

Twitter

bile-edge