Thursday, 21 August 2003
I’ve heard several people in the industry assert that HTTP fundamentally limits the performance of applications that use it; in other words, there’s a considerable disadvantage to using it, and that therefore other protocols (usually proprietary or platform-specific systems that those same people happen to sell) are needed to “unleash the power of Web services.”
Specifically, it’s common wisdom that HTTP maxes out at a few hundred requests per second per CPU, give or take, while the requirements for “enterprise-class” Web services is anywhere from one to tens of thousands of messages per second.
Putting aside the issue of what reality these requirements are based upon, let’s examine what’s being said here. Many Web applications are indeed limited to a few hundred request/response pairs per second, but that isn’t necessarily caused by HTTP. If your code chews up considerable CPU time to process the payload of a message, it doesn’t matter if you put some high-performance, mega-bucks message transport in front of it; you’ll still have some performance limitations.
Also, remember that HTTP as a protocol may not have any serious performance limitations in its design, but implementations of it may. Apache (most people’s yardstick) was explicitly designed for functionality, not speed.
Looking at more highly optimised implementations tells a different story; Web proxy/caches, for example. During The Bubble, when caching was cool, The Measurement Factory held regular Web caching Bake-offs that were widely attended in the industry. The last one was held in 2001 and there wasn’t broad participation, but the results are illuminating; the highest measurement is 2500 requests/second and you can get 1250 requests/second for less than $5,000. Based on talking to various people in the industry, I’m confident in saying that there are implementations not tested in the Bake-offs that can do at least 5,000 requests/second on commodity (and cheap) hardware.
Keep in mind that this isn’t a loopback test; there’s a Web cache behind the HTTP stack that gets about half of the responses out of memory, and the other half from disk. It’s also a fully functional HTTP stack that’s well-exercised by the workload.
Another way to look at it is to think of the fundamental limitations imposed by the protocol itself, rather than the application behind it. I had lunch today with John Dilley, an old colleague and fellow HTTP wonk. We pretty quickly agreed that the only fundamental performance limitations in HTTP are:
1) Headers. HTTP headers, along with the request and response lines, are textual, extensible, and pretty nasty to parse. However, there are also steps you can take to minimise this overhead; homogenous applications (like SOAP) can keep an opaque block in memory most of the headers, and receivers can use tricks like fast paths for common cases to improve performance.
There also aren’t many headers that actually need to be sent; the minimum for requests is Content-Length, Content-Type and Host; for responses, it’s Content-Type and either Content-Length or TE.
2) Request-Response Message Exchange Pattern. HTTP is fundamentally request/response; however, one-way messages in either direction are easily accommodated with only the overhead of a small one-packet message in the other direction. KnowNow and others have also found interesting ways around this limitation.
I suppose one could also consider HTTP’s use of TCP as a limitation, but I think that’s once again more to do with specific implementation problems (and there are a number of very common ones) than it is fundamental. Even if it were, HTTP isn’t limited to TCP; it is possible to use UDP, and I believe some have done it.
John and I reckoned that the fundamental throughput limitations of HTTP over TCP are somewhere in the multiple tens of thousand message pairs per second per CPU on commodity hardware. Of course, your application won’t be able to realise that performance because it has its own costs. Additionally, if your associated message size and overhead are small enough, HTTP will be relatively expensive as a transfer protocol.
Even so, this is a far cry from “HTTP makes my application slow, let’s replace it.” There are lots of places – both in your HTTP stack and elsewhere – to optimise first.