About a year ago, during a time when I was barely aware about Cloud Computing or Amazon Web Services, I got assigned, along with a couple of colleagues from this consulting company I was working for, to bring an existing codebase from “alpha” to “production” and then ensure its smooth deployment to Amazon Cloud.
The customer wanted to go “live” in less than 3 months; they also wanted to be able to handle tens of thousands of visitors that would obviously click on banners and make them money. What it’s actually more probable is that they were hoping for a good exit, that is passing the hot potato to somebody else while walking out with a proft. On a side note, there is a term that could be used for these people, but this is not a meme-text so I won’t go more on that route.
Starting on a new project
With this project, things initially went to some direction: we had to incrementally deal with quite a few functionality issues and in the end we were able to put fixes for more than 100 bugs and glitches. Actually, this was all that we could do, along with the long hours required to get things done.
We could not be bothered by any setup issues with the “Cloud” configuration: we knew near to nothing on the topic and the customer fiercely guarded the “keys to the kingdom”; they would only agree on instance and resource set-ups on a case-by-case basis anyway. They were probably thinking they were paying way too much for those pesky Eastern European contractors (us), so I kind of get the “why” on keeping a close eye over the Amazon bill. It was fine by us; at the end of the day it was their home, with their needs and their rules.
From time to time one may receive a request from QA team in line of:
For testing purposes, I need that /opt/test/xxx directory be limited to 10 Megabytes. This directory is used by this zzz application ran as user tester.
How could the directory size be limited in Linux? Is it even possible? – these are fair questions and the answer is yes. One needs to:
-
Use the directory as a mount point for a size-limited storage device;
-
Use the proper mount options to allow full access to the non-root user specified;
-
Disable Selinux (easy) or allow that particular user to access data on mount points (complicated).
Let’s start with the beginning, the storage device. There are multiple options here:
-
A simple loop device (a regular file used as a file system);
-
A logical volume (LVM), assuming the disk setup is based on this technology and there is enough free space left to accomodate the new device;
-
Attaching a new storage device (e.g. in a Cloud environment like Amazon Web Services).
Note: While I found this issue on Apache httpd, it may apply to any http server out there.
HTTP KeepAlive
The “KeepAlive” concept is simple: the browser opens a connection to the server and sends out multiple requests (e.g. for the main page, for stylesheets, javascript includes and images) through a single connection. This effectively reduces the page load time, providing – or at least that’s what the theory says – a better customer experience. All good from this perspective.
Server Configuration
The Apache httpd server controls this feature through 3 configuration directives, all of them in the core module:
-
KeepAlive: on/off, default on;
-
KeepAliveTimeout: seconds, default 5;
-
MaxKeepAliveRequests: positive number, default 100.
There is no setting in Apache httpd to somehow allow KeepAlive and non-KeepAlive requests at the same time (e.g. to allow up to 100 KeepAlive requests, what comes above to be treated differently). One must choose the server behavior from the very beginning.
The Traffic
Now it’s math time. Starting from the MaxClients value (default 256), what is the request rate (new clients / second) that can be served without compromising the user experience? MaxClients you may say, but let’s not draw conclusions too fast. There are some issues to be considered:
-
The time between opening the connection and the KeepAliveTimeout expiration. On a default configuration, at the very least 5 seconds, but on a more typical side, maybe 7-10 seconds;
-
Traffic fluctuation (spikes);
-
Internal Apache httpd time (spawning new processes to handle new connections, etc).
It’s getting complicated. But assuming a typical 7 seconds time for every process that serves a client and an uniform behavior (all or almost all clients keep a server process busy or waiting for data for 7 seconds), on a default setting of 256 MaxClients, the uniform traffic rate that can be served is 256/7 = 36 (new) requests / second. Any spike larger than that will cause page loading delays and a poor user experience.
Better Planning
If disabling KeepAlive is not an option, all the planning should start from the worst expected spike that is to be handled (e.g. 2x or 3x the average during the busiest period). For a 50 req/s busy period average, the server should be able to allow 100 or even 150 req/s without compromising the user experience. If the configuration is already maxed out from the hardware point of view, then other solutions should be looked into (multiple servers, load balancers, I’m not dwelling into that).
Assuming a maximum rate of 150 req/s, with a KeepAliveTimeout at the 5 seconds default, one may need to adjust MaxClients (and ServerLimit if prefork mpm is used) to somewhere in the area of 1,000.
Conclusion
Don’t leave the defaults on; always start the parameter calculation from the desired outcome first. On suboptimal hardware (memory wise) disabling KeepAlive is the way to go to squeeze a bit more performance. Oh, and don’t assume the hardware is capable of keeping up with your calculations; turn any failure or mis-planning in a learning subject.
That’s it for today, have fun!