Monday, 21 August 2006
Caching Performance Notes
There have been some interesting developments in Web caching lately, from a performance perspective; event loops are becoming mainstream, and there are lots of new contenders on the scene.
Fortuitously, I’ve been benchmarking proxies with an eye towards the raw performance of their HTTP stacks (i.e., how well they handle connections and parse headers, serving everything from memory), for work, so that we can select a platform for internal service caching.
In particular, I’ve been looking at the maximum response rate that a device can deliver, how much latency it introduces, and what the effect of overload is on both of those metrics. Additionally, how they handle large numbers of idle persistent connections is very interesting, because of their performance-enhancing effects, especially in an AJAX world (see Caching Web 2.0 for more).
Update: I asked for dual CPU boxes, and was fooled by Xeon HT; all of these tests are on a single CPU. D’oh. Hyper Theading usually gives multi-threaded user apps a 15%-20% boost; I’m going to re-test with truly multiple CPUs (or at least cores) soon. Mea Culpa.
Squid has been around pretty much since the start of the Web, as an outgrowth of the Harvest project. While it’s probably also the most widely deployed caching proxy, it’s been criticised because it doesn’t perform very well. In particular, when it’s overloaded it behaves very badly, increasing the response times as load increases, eventually delaying for multiple seconds and dropping lots of connections.
However, Squid 2.6 was recently released, with support for epoll and kqueue. My testing shows it as being much better-behaved under load; response times are perfectly flat at about 180ms during overload, no matter what the request rate. 2.6 was also able to hold 15,000 persistent connections open without any noticeable change in response rate or latency. Impressive.
Squid 2.6 doesn’t perform quite as well in terms of raw capacity (serving about 7,500 small responses per second, vs. 2.5’s 9,000), but hopefully it’ll get better as the 2.6 line matures.
One thing that still concerns me about Squid is that its capacity drops much more than other proxies do when response sizes get larger.
Overall, Squid is a good workhorse that’s somewhat limited by its age; since it it’s single-process and single-threaded, it can’t take advantage of multiple cores, putting it at a severe disadvantage against threaded servers. Still, it’s very configurable, has good instrumentation, and is a known quantity. Not a bad option.
Apache 2.2’s mod_cache along with mod_proxy make it possible to cobble together a caching intermediary from the venerable (and extremely popular) server. This was highlighted in a recent OSCON presentation, where I’ve heard it was touted as a serious competitor to Squid for gateway caching (a.k.a. reverse proxying).
First, the worker MPM. While raw capacity was good at around 14,000 responses a second and overload behaviour was beautiful, this configuration utterly fell down when I tried holding any significant number of idle persistent connections open.
I think this is because the worker MPM uses a thread per connection, and when you run out of threads, you can’t accept any more connections. That can’t be the whole story, though, because even with Apache configured to have 8,000 threads (spread between a number of processes), it wasn’t happy when more than about 1,000 connections were open.
That’s a big problem, so I next tried the event MPM, in hopes of avoiding this problem. It was better, but still was only able to hold about 11,000 connections open before giving up, and introducing about 100ms of latency as well. Additionally, it had lower overall performance, topping out at about 12,000 responses a second, and was unstable under extreme load.
I was really hopeful about Apache, but until the event MPM comes out of experimental status, it doesn’t seem like a good idea.
Another contender is Varnish, a brand-new multi-threaded gateway project out of Norway. While it’s light on documentation, what’s on the site looks promising; the folks behind it seem to respect the protocols and have good intentions.
I only briefly tested it, and saw it go up to about 10,000 responses/second before taking a serious dive when overloaded, down to less than 1,000, while response times rocketed to more than a second.
Of course, it’s still an alpha project, so it’s definitely one to keep an eye on.
Lighttpd isn’t a caching proxy, it’s a high-performance Web server. However, it does have a proxy module (that’s being actively rewritten for version 1.5) and there has been some interest expressed in writing a caching module for it.
There’s a good reason for that; Lighty (as it’s called) is very, very fast — 19,000 responses/second kind of fast. It handles overload very gracefully, and it doesn’t blink when it has a large number of idle connections open.
In short, Lightly would be an excellent basis for a proxy or gateway cache, if we can get the caching part taken care of. Listen to an interview with Lighty’s primary developer for more.