mark nottingham

What to Look For in a HTTP Proxy/Cache

Friday, 12 June 2009

HTTP Caching

Part of my job is maintaining Yahoo!’s build of Squid and supporting its users, which use it to serve everything from the internal Web services that make sites go to serving Flickr’s images.

In that process, I often am asked “what about X?”, where X is another caching or load balancing product (yes, Squid can be used as a load balancer). For example, Varnish, or lighttpd.

Generally, these comparisons come down to three factors; performance, features and manageability. Almost invariably, Squid doesn’t do as well as newer comers in performance (although it generally is faster than Apache), but wins on features and manageability — and that’s why it’s so widely used.

I’m not going to argue that Squid is best for every deployment, but I do think that it’s important to evaluate the whole picture, rather than just one metric. So, here are a few initial thoughts about what’s important when you’re evaluating a proxy/cache:


Performance can mean a lot of things. The least interesting but most widely cited benchmark for this kind of server is “how many 1k responses can it serve from memory per second?” but that doesn’t tell you how it will do serving 200K (or 200M) responses from disk, which is a much more difficult thing to manage.

Try looking at:



How does the proxy handle multiple requests for the same URL? This is often critical in “reverse proxy” deployments, where a flood of requests can come in for the same thing if it gets suddenly popular, or when you first bring a cache online. If the response isn’t cached and fresh, that flood of requests can quickly overcome your back-end servers.

There are a few techniques for dealing with this. Collapsed forwarding will only allow one request for a URL to go forward at a time, if there isn’t anything in cache; if the response is cacheable, it will be sent to all waiting clients, saving those requests from going forward and swamping the origin server.

If something is cached but stale, stale-while-revalidate lets the cache serve the stale response while it refreshes what it has in the background. Not only does this save you from a flood of validation requests, but it also effectively hides the latency of refreshing your content from your clients, offering better quality of service.


In my experience, one of the biggest things that gets a workout in a proxy/cache is the ACL system. Make sure you have maximum flexibility here; e.g., can you apply access control to something based on whether it’s a cache miss? Can an ACL select things by the request method, URL, headers, client address? Can you combine ACL directives? Can you extend the ACL system?

Streaming and Buffering

A good proxy will offer fine-grained control over how it buffers requests and responses. For example, if you’re deploying as a reverse proxy, you want to be able to buffer up the entire response, so that you can free up resources on the origin server as quickly as possible if the client is slow. Likewise, buffering the request before sending it to the origin server can help conserve resources in some deployments, increasing capacity.

Conversely, however, it’s not good if your proxy requires responses to be buffered before they’re sent; this consumes too many resources on the proxy if you’re sending large responses, and doesn’t work at all for streaming applications (e.g., video).

Cache Behaviour Tuning

Although HTTP has excellent controls to allow both the origin server and the client to say how caches should behave, inevitably there will be cases where you’ll need to… ahem… fine-tune them. This includes tuning the heuristic algorithm, which is what to do when there are no such instructions.

It also includes overriding the specified behaviour. For example, a reverse proxy probably wants to ignore Cache-Control: no-cache, since the cache is under control of the origin server.

All of these tuning knobs need to be applicable in a fine-grained way; Squid does it with regular expressions against the URL (in refresh_patterns).

Cache Configuration

The cache as a whole needs to be configurable as well.

For example, when the set of cached objects gets larger than the allocated memory or disk space, the cache needs to evict some. As a mountain of research will attest, some replacment policies are more efficient than others, especially under different workloads.

Resilience to Errors

Networked systems inevitably fail. Besides the obvious aspects of this (e.g., configurable timeouts ), in a cache it’s also important to handle failures as gracefully as possible, to preserve both quality of service and cache efficiency.

Stale-If-Error helps to hide temporary back-end problems by allowing a cache to use a stale cached response (if available) when it can’t get a fresh one, or if the server returns an error code like 500 Internal Server Error. For situations where having something stale is better than nothing at all, this helps.

Quick Abort works from the other side; when the client aborts (because of a network or software problem, or a simple timeout), the cache should be able to be configured to continue downloading the response from the server, so that the next client will have the benefit of having it in cache.


Caches are often deployed in sets, both to increase capacity and also to assure availability. In these deployments, support for inter-cache protocols like ICP and HTCP means a better hit rate and, perhaps more importantly, the ability to bring a “cold” cache up-to-speed without overloading origin servers.

When evaluating support for peering, keep in mind that HTCP is more capable than ICP, because it takes into account the request headers, not just the URL. Also, HTCP CLR support means that something becoming invalid in one cache can trigger purges from neighbouring caches too (a pattern I’ll talk more about soon). Good implementations should also have a means of assuring that forwarding loops don’t happen.

Finally, Cache Digests are an interesting way to use a Bloom filter; by keeping a lossy digest of peers’ contents, it’s possible to predict whether a given request will be a hit. This is useful when the latency between peers makes “normal” inter-cache protocols too expensive (e.g., deployments between coasts or continents).


Proxies often get used as layer 7 routers; usually, to shift traffic around to the right server, for some value of “right.” A good proxy will have a number of tools to help you do this, including active and passive monitoring of peers and origin servers (to determine health and availability), flexible request rewriting (including both the request URI and response Location headers), and controls over how many connections can go to a particular server, as well as how many idle connections to keep open to each server.

Another form of routing is CARP, which routes based upon a consistent hashing algorithm — like DHTs. This allows you to build a massive array of caches to serve a very large working set (e.g., photos, a CDN).

One thing that often goes hand in hand with routing is retries — i.e., being able to try a different origin server (or IP address, or peer) if you can’t get a successful answer on the first try (if allowed by the protocol; this makes sense for GET, not POST, obviously).

Getting the Standards Right

Really, this isn’t a feature, it’s a floor to entry. If you’re going to use a proxy/cache, you have to be sure that it’s going to behave in a predictable, interoperable way, and that means conforming to HTTP1.1, SSL and all of the other applicable standards.

In the case of HTTP, this means not taking shortcuts; for example, variant caching is hard, but it’s necessary to have it for a cache to be useful. A great tool to help evaluate this is Co-Advisor.



A proxy is worthless if it goes down all of the time, or if you’re worried that it will. Part of this is how mature it is, and part is how well it’s been tested. One of the reasons I like Squid is that it’s used in thousands (if not tens of thousands) of applications around the world; it’s been around for more than a decade, so it’s been hammered on hard.

Because of this breadth of deployment, I can confidently use it in a new (to me) situation, knowing that it’s probably been used in that way before. Contrast this with software that’s been designed for a particular purpose and hasn’t been used outside that narrow profile very much.


Managing a cache means knowing what it’s doing, and what went wrong if you have a problem. A good implementation should have extremely extensive metrics available, ideally in many forms (e.g., over HTTP, SNMP, in logs), as well as easy-to-use debugging mechanisms, because at the end of the day all of these platforms are really complex beasts.

Ease of Use

Finally, caches have to be intuitive to use. Typically, they’re designed for a sysadmin or a netadmin, not a developer, and I think this is a shame, because these days that should be a primary audience.


Max Robbins said:

I think you make an excellent point that feature set is what deploying a Proxy/web acceleraltor is about. I would like to make you aware of what I believe to be a superior solution on performance, features and manageability. It is what you would get in my opinion if you created squid from scratch today using the most modern architecture.

aiCache is commercial so open source folks can close their eyes now and run away. is the URL

The feature set is here. The performance is +25k transacations a seconds and the management features are superior in all three iteraitons, web, CLI, and SNMP

Dynamic caching & sharing of web content, including both GET and POST requests

RAM-based response caching for instantenous response times

Offloads TCP/IP, request and response processing from origin servers.

aiCache is a right-threaded, lightweight, fully pipelined non-blocking application Serves up to 25000 req/sec and manages tens of thousand of simultaneous connections via a single instance

Site fallback feature keeps your site up even in case of catastrophic infrastructure failures

Flexible document freshness control, including cookie-driven control and on-demand header-driven cache expiration

Advanced CLI for in-depth management, monitoring and troubleshooting

Three different ways to monitor aiCache: CLI, Web and SNMP, with rich set of statistics available.

Flexible regular expression or simpler string matching for cache config

Load balances requests across a number of origin web servers. Supports server pooling

Monitors and reports health of origin servers, including response content matching

Security features, including protection against DOS attacks

Selective log suppression. Time or size-based zero-impact instant log file rotation

Built-in smart, non-flooding alerts. Alert on req/sec, response time, number of client or server connections and much more

Configurable on-the-fly content compression

I look forward to discussing a thoughtful comparison.

Wednesday, July 1 2009 at 12:40 PM

John Moylan said:

@Max ..quite a sales pitch.

@Mark .. I’m a big squid user/fan too. Varnish seems to have a lot of buzz around it, have you had the chance to look at it? and if so, then do you think it might meet your requirements?

Saturday, July 18 2009 at 12:59 PM

Adrian Chadd said:

The problem with Varnish, which the authors are painfully aware of but don’t seem to want to talk about it all that much, is that it is a very specific use-case proxy. If your clients are close, your hit rate is high, your active workload fits in memory and you don’t have a lot of clients doing POST, you’re fine as dandy. Things degrade ungracefully (or at least did the last time I checked) if this isn’t the case.

Squid doesn’t perform as well as varnish in the specific case but it degrades much, much better under a wide variety of workloads and abnormal traffic conditions.

Monday, July 27 2009 at 6:57 AM