Wednesday, September 11, 2013

Gunicorn - Workers that matter

One day, I ran "htop" to monitor the server processes. And I saw the memory load was too high (69% - 70%) for a server with 2 small Django projects running in 2 different isolated python environments (virtualenv) and 1.5GB of RAM. I started to analyze all processes to figure out what are consuming server's memory and how to decrease it.

I fixed the celery beat (, but the memory load was still high (64%). And then I looked at the gunicorn startup script of 2 Django projects. I configured each of them to run with 3 workers. I tried to reduce it to 2 workers. And the memory load reduced significantly (to 44%).

set -e
LOGDIR=$(dirname $LOGFILE)
# user/group to run as
cd /home/projects/project1

source /home/.venv/django1_4/bin/activate
test -d $LOGDIR || mkdir -p $LOGDIR
exec gunicorn_django -w $NUM_WORKERS --bind=$ADDRESS --user=$USER --group=$GROUP --log-level=debug --log-file=$LOGFILE 2>>$LOGFILE

For a small server like mine, the number of Gunicorn workers should not so many (2,3 wokers is OK) while each worker will consume an amount of memory. But, there is a thing, the more workers (not so many), the faster they will help the Gunicorn serve requests. So, It a trade off to decide how many workers should I run. There are several factors which can effect the worker calculation, such as: how many users my Django projects will serve (estimable), how much RAM and how many CPU I have?...

These following lines from the Gunicorn's documentation may make some points:

How Many Workers?

DO NOT scale the number of workers to the number of clients you expect to have. Gunicorn should only need 4-12 worker processes to handle hundreds or thousands of requests per second.
Gunicorn relies on the operating system to provide all of the load balancing when handling requests. Generally we recommend 
(2 x $num_cores) + 1 

as the number of workers to start off with. While not overly scientific, the formula is based on the assumption that for a given core, one worker will be reading or writing from the socket while the other worker is processing a request. Obviously, your particular hardware and application are going to affect the optimal number of workers. Our recommendation is to start with the above guess and tune using TTIN and TTOU signals while the application is under load.
Always remember, there is such a thing as too many workers. After a point your worker processes will start thrashing system resources decreasing the throughput of the entire system.