Note: While I found this issue on Apache httpd, it may apply to any http server out there.
HTTP KeepAlive
The “KeepAlive” concept is simple: the browser opens a connection to the server and sends out multiple requests (e.g. for the main page, for stylesheets, javascript includes and images) through a single connection. This effectively reduces the page load time, providing – or at least that’s what the theory says – a better customer experience. All good from this perspective.
Server Configuration
The Apache httpd server controls this feature through 3 configuration directives, all of them in the core module:
-
KeepAlive: on/off, default on;
-
KeepAliveTimeout: seconds, default 5;
-
MaxKeepAliveRequests: positive number, default 100.
There is no setting in Apache httpd to somehow allow KeepAlive and non-KeepAlive requests at the same time (e.g. to allow up to 100 KeepAlive requests, what comes above to be treated differently). One must choose the server behavior from the very beginning.
The Traffic
Now it’s math time. Starting from the MaxClients value (default 256), what is the request rate (new clients / second) that can be served without compromising the user experience? MaxClients you may say, but let’s not draw conclusions too fast. There are some issues to be considered:
-
The time between opening the connection and the KeepAliveTimeout expiration. On a default configuration, at the very least 5 seconds, but on a more typical side, maybe 7-10 seconds;
-
Traffic fluctuation (spikes);
-
Internal Apache httpd time (spawning new processes to handle new connections, etc).
It’s getting complicated. But assuming a typical 7 seconds time for every process that serves a client and an uniform behavior (all or almost all clients keep a server process busy or waiting for data for 7 seconds), on a default setting of 256 MaxClients, the uniform traffic rate that can be served is 256/7 = 36 (new) requests / second. Any spike larger than that will cause page loading delays and a poor user experience.
Better Planning
If disabling KeepAlive is not an option, all the planning should start from the worst expected spike that is to be handled (e.g. 2x or 3x the average during the busiest period). For a 50 req/s busy period average, the server should be able to allow 100 or even 150 req/s without compromising the user experience. If the configuration is already maxed out from the hardware point of view, then other solutions should be looked into (multiple servers, load balancers, I’m not dwelling into that).
Assuming a maximum rate of 150 req/s, with a KeepAliveTimeout at the 5 seconds default, one may need to adjust MaxClients (and ServerLimit if prefork mpm is used) to somewhere in the area of 1,000.
Conclusion
Don’t leave the defaults on; always start the parameter calculation from the desired outcome first. On suboptimal hardware (memory wise) disabling KeepAlive is the way to go to squeeze a bit more performance. Oh, and don’t assume the hardware is capable of keeping up with your calculations; turn any failure or mis-planning in a learning subject.
That’s it for today, have fun!
This is a quick’n’dirty tutorial on how to get e-mail working on a basic Linux (Redhat/Fedora/CentOS) installation like the one you may get if you deploy a node in AWS with a predefined AMI (Amazon Machine Image).
The Redhat linux distributions come by default with postfix as MTA (Mail Transport Agent) and Cyrus IMAP as the MDA (Mail Delivery Agent) so this small tutorial is focused on these. Having the packages installed we can go to the configuration files:
-
/etc/postfix/main.cf: the postfix file, where we configure which e-mails we accept and what to do with them.
-
/etc/imapd.conf: settings for Cyrus IMAP – where to look for the mailbox passwords, what authentication mechanisms should be supported for pop/imap.
-
/etc/sasl2/smtpd.conf: for smtp authentication, if an e-mail relay is required (NB: this file can also be located in /usr/lib/sasl2/smtpd.conf).
There is also another file, /etc/cyrus.conf that usually contains the proper defaults upon installation so there may be no need to look into that. It contains settings like the supported protocols (pop3/pop3s/imap/imaps) and the lmtp socket location (the interface between postfix and Cyrus IMAP).
In the postfix configuration file, the following settings are essential:
# cat /etc/postfix.main.cf ... mailbox_transport = lmtp:unix:/var/lib/imap/socket/lmtp virtual_transport = $mailbox_transport virtual_mailbox_domains = /etc/postfix/virtual_domains virtual_mailbox_maps = hash:/etc/postfix/virtual_maps ...
Explanation: the e-mails for the domains specified in virtual_mailbox_maps are to be, before anything else, accepted for further processing. The final mailbox must be determined by looking into virtual_mailbox_maps (if no such mailbox is determined, some error will be returned to the sender). The effective delivery should be done through the socket specified at mailbox_transport.
Some example file contents:
# cat /etc/postfix/virtual_domains brainware.ro # cat /etc/postfix/virtual_maps john.doe@brainware.ro brainware.ro/john.doe
The “virtual_maps” file is to be “hashed” with postmap (have a hashmap generated out of it):
# postmap hash:/etc/postfix/virtual_maps
On the Cyrus IMAP side, we must first check that the daemon listens on the proper lmtp socket (by default it should):
# cat /etc/cyrus.conf | grep lmtp lmtpunix cmd="lmtpd" listen="/var/lib/imap/socket/lmtp" prefork=1
NB: at this point one may want to disable SELinux in order to allow for the socket communication between postfix and Cyrus IMAP.
One of the less known features of Gmail is the ability to receive e-mails sent to particular aliases of the main e-mail address, e.g. mails sent to john.doe+22@gmail.com will get to the main mailbox, john.doe@gmail.com. (NB: the address is made up, hopefully this is not someone’s real address).
How would you replicate such feature on a local node you manage? I won’t cover all the details, at most I will just scratch the surface a bit. When using Postfix (the default MTA installed with Redhat/Fedora/CentOS), one must first look to the place where such functionality can be put in, the virtual_alias_maps option in the configuration file (main.cf).
The mapping concept in Postfix is common throughout its options. A “map” must be able to provide a translation between an input (e.g. an e-mail address) and some desired output (e.g. the mailbox location on the disk, the real destination e-mail). The “virtual” part is understood by Postfix as anything that is not tied to a real Unix account. So, in order to be able to provide the Gmail-style aliasing, one must essentialy create a (virtual) conversion map that would relate an input (addresses with “+“) to the desired output (the main mailbox).
Looking back to the wildcard e-mail addresses that we may get e-mails sent to, the only solution to get them matched to the real destination is by using some sort of regular expression. Simple map types accepted by Postfix (e.g. hash tables) perform exact matching, e.g. we may be able to alias john.doe+1 to john.doe but not john.doe+2, neither +99 (well, we are able, but we must put in a line for every such alias). Postfix does support regular expression matching through a built-in module, so we may have in the configuration something like:
# cat /etc/postfix/main.cf | grep virtual_alias_maps virtual_alias_maps = regexp:/etc/postfix/virtual_alias
And the “virtual_alias” file may contain something like:
# cat /etc/postfix/virtual_alias /^sample.name+(.*)@brainware.ro$/ sample.name@brainware.ro
This solution just works, but there are some drawbacks to it:
-
One must manually edit the “virtual_alias_maps” file to add new aliases. This is not practical for large installations with e-mail accounts created and deleted from web interfaces.
-
The regular expression matching is done within Postfix, theoretically slowing down the mail system throughput. On a busy server with thousands of such aliases this may become a noticeable issue (maybe not much of a serious problem with modern hardware, though).
For both problems, the solution is to offload the address matching to a database installation (e.g. MySQL, PostgreSQL) and do the regex matching from a stored procedure. I may come back to this on a later date, though. But for now, thank you for your time!