The price of sucess - too many users!

Recently at work we had been having issues with spiking server load. One of the potential suspects was the Apache configuration as it was allowing 256 MaxClients. Combine that with Drupal eating RAM for breakfast (say a minimum of 12Mb per page) and you have a recipe for disaster - too many visitors cause a RAM shortage, lots of swapping and eventually a server meltdown. After speaking the Rackspace Technical Support Team, one of the guys there (Daniel) wrote a VERY useful script for us to run on the server to monitor Apache usage.

The basic principle is to regularly use the ps command to get a process list, then to filter is using grep, then filter using grep again to remove the line from the process list which was filtering for processes ('grep httpd' contains the phrase httpd so it gets included in the initial filter). Finally use the wc command to do a count of the result. This theory produces the following line of code which you can run on your system…

ps aux | grep http | grep -v "\(root\|grep\)" | wc -l

Here is an explanation…

ps aux

Get a process list. The 'a' causes a full process list for the current terminal. The 'u' causes it to be user-oriented. The 'x' causes it to be for the current user only.

grep http

This goes through the list produced and reduces it to lines containing only the word 'http'.

grep -v "\(root\|grep\)"

Unfortunately for the previous process, it also contains one of the words it's filtering for - this means it appears as one of the processes. Root also owns one of the Apache processes (the parent one maybe?). We want to filter out these two so we use grep as we did before but filter out for the lines containing grep and root. The '-v' option tells grep to make this an inverse-filter.

wc -l

This is a really simple and yet useful function. It's a word counter and the '-l' (thats a lowercase L) option tells it to count newlines which is what separates our process in the list!

By running the above command you will get a number returned on the terminal. This tells you how many apache processes are currently running on your system. A slight variation on the initial 'ps' command will give you some pretty usefull information…

ps axo 'pid user size cmd' | grep http | grep -v "\(root\|grep\)"

This version will very nicely list you out a table of running apache processes (not caused by root or grep) with only 4 columns - Process ID, Username (of the process owner), Size (in Kb - I THINK) and the Command that was run. This means you can quickly see how much actual RAM your webserver is using for apache!

Finally - you want to be able to know how many processes are running over the periof of a day, etc... This is where Rackspace turned those commands into a VERY nice bash script!

#!/bin/bash

THRESHOLD=100
ADDRTO="admin@mysite.com"
SUBJECT="Apache Process Check"
LOCKFILE="/tmp/apache_process_check.lock"
LOGFILE="/var/log/apache_processes.log"

NUMHTTPD=`ps aux | grep http | grep -v "\(root\|grep\)" | wc -l`
echo "`date +'%Y-%m-%d %H:%M:%S %Z'` - ${NUMHTTPD}" >> ${LOGFILE}

if [[ ${NUMHTTPD} -gt ${THRESHOLD} ]]; then
  if [ ! -e "${LOCKFILE}" ]; then
    echo "The number of currently running httpd threads is ${NUMHTTPD}." | mail -s "${SUBJECT} - Above Threshold" ${ADDRTO}
    touch ${LOCKFILE}
  fi
else
  if [ -e "${LOCKFILE}" ]; then
    rm -f "${LOCKFILE}"
    echo "The number of currently running httpd threads is ${NUMHTTPD}." | mail -s "${SUBJECT} - Below Threshold" ${ADDRTO}
  fi
fi

This, quite simply, will log the apache process. If the threshold count is breached (in this case, 100) then it will create a lock file and email the address specified letting the admin know that they're quite close to their limit. The lock file gets deleted when the process count drops below the threshold again. This semaphore stops the script spamming the admin when the server is under load. It also pipes a timestamp and the result of the process count into a logfile (on a new line). This log could easily be imported into something like Excel if you want to produce pretty graphs. If you're slightly more challenge-oriented then maybe you could write a script to parse it into RRDTool!

The next step is to make that script executable and then setup crontab to run it regularly (say, every 5 minutes).

I'd like to thank Rackspace for their "fanatical support" - especially Daniel for going above and beyond the call of duty. This is a REALLY handy script!