Did you think that list comprehensions were a complicated thing? I suspect you’ll think again about complicated things in Python while reading this text.
Lambdas
By definition, lambdas are short functions defined in place, used for small data processing tasks. There are 2 reasons on why they should be used:
-
Execution speed – they can be optimized by compilers, first by removing an actual function call, next by opening the door for more optimizations through any possible internal (by the compiler) code re-arrangement;
-
Writing less code.
A Python example of a n square lambda:
g = lambda x: x ** 2 # e.g. g(3) yields a value of 9
Lambdas are usually used in conjunction with data processing functions such as filter, map and reduce.
Filtering
If we want to select only a portion of some input, the filter function comes to its best use in combination with a lambda:
print filter(lambda x: x % 2 == 0, xrange(0, 11)) ... [0, 2, 4, 6, 8, 10]
Transformation
Applying a transformation function to the entire input is a job for map:
print map(lambda x: x * 2, xrange(0, 11)) ... [0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20]
Summarizing
Processing the entire input in order to get a single final result is a job for reduce. Please note that 2 arguments are required for the lambda function used for processing; the first 2 elements are used in the beginning, then the lambda result and the next element are used until the input is exhausted.
print reduce(lambda x, y: x + y, xrange(0, 11)) ... 55 # 1 + 2 + ... + 10
Now for some serious stuff:
Dictionaries are Python implementations of hash tables. They store arbitrary key-value associations, not restricted to a single data type. The dictionaries do not preserve order (no order between elements is defined) so there is no guarantee that any element enumeration will yield a certain order.
NB: unless otherwise specified, everything is Python 2 syntax.
Defining a dictionary:
D = { }
Initializing a dictionary:
D = dict( (i, False) for i in xrange(1, 11) if i % 2 == 0 ) #Python 2 D = { i : False for i in range(1, 11) if i % 2 == 0 } #Python 3
Note: this syntax is also known as dictionary comprehension.
Enumerating all the items in a dictionary:
for i in D: print i, D[i] #key, value
…or:
for k, v in D.iteritems(): print k, v #key, value
Looking up an element:
if k in D: print k
Adding elements:
D[key] = value
Note: no error is thrown if the element already exists; an overwrite happens.
Removing elements:
del D[key]
Using a dictionary as a lookup table to remove duplicates in a list:
D = {} for i in L: if i in D: del L[i] else: D[i] = True
Note: a set data type is more suited for such task. There is also a simplified syntax (e.g. list(set(L)) ) that can be used if the element order within the input list does not need to be preserved.
That’s it for today, thank you for your read!
Note: While I found this issue on Apache httpd, it may apply to any http server out there.
HTTP KeepAlive
The “KeepAlive” concept is simple: the browser opens a connection to the server and sends out multiple requests (e.g. for the main page, for stylesheets, javascript includes and images) through a single connection. This effectively reduces the page load time, providing – or at least that’s what the theory says – a better customer experience. All good from this perspective.
Server Configuration
The Apache httpd server controls this feature through 3 configuration directives, all of them in the core module:
-
KeepAlive: on/off, default on;
-
KeepAliveTimeout: seconds, default 5;
-
MaxKeepAliveRequests: positive number, default 100.
There is no setting in Apache httpd to somehow allow KeepAlive and non-KeepAlive requests at the same time (e.g. to allow up to 100 KeepAlive requests, what comes above to be treated differently). One must choose the server behavior from the very beginning.
The Traffic
Now it’s math time. Starting from the MaxClients value (default 256), what is the request rate (new clients / second) that can be served without compromising the user experience? MaxClients you may say, but let’s not draw conclusions too fast. There are some issues to be considered:
-
The time between opening the connection and the KeepAliveTimeout expiration. On a default configuration, at the very least 5 seconds, but on a more typical side, maybe 7-10 seconds;
-
Traffic fluctuation (spikes);
-
Internal Apache httpd time (spawning new processes to handle new connections, etc).
It’s getting complicated. But assuming a typical 7 seconds time for every process that serves a client and an uniform behavior (all or almost all clients keep a server process busy or waiting for data for 7 seconds), on a default setting of 256 MaxClients, the uniform traffic rate that can be served is 256/7 = 36 (new) requests / second. Any spike larger than that will cause page loading delays and a poor user experience.
Better Planning
If disabling KeepAlive is not an option, all the planning should start from the worst expected spike that is to be handled (e.g. 2x or 3x the average during the busiest period). For a 50 req/s busy period average, the server should be able to allow 100 or even 150 req/s without compromising the user experience. If the configuration is already maxed out from the hardware point of view, then other solutions should be looked into (multiple servers, load balancers, I’m not dwelling into that).
Assuming a maximum rate of 150 req/s, with a KeepAliveTimeout at the 5 seconds default, one may need to adjust MaxClients (and ServerLimit if prefork mpm is used) to somewhere in the area of 1,000.
Conclusion
Don’t leave the defaults on; always start the parameter calculation from the desired outcome first. On suboptimal hardware (memory wise) disabling KeepAlive is the way to go to squeeze a bit more performance. Oh, and don’t assume the hardware is capable of keeping up with your calculations; turn any failure or mis-planning in a learning subject.
That’s it for today, have fun!