mnot’s blog

Design depends largely on constraints.” — Charles Eames

Wednesday, 12 December 2007

Two HTTP Caching Extensions

We use caching extensively inside Yahoo! to improve scalability, latency and availability for back-end HTTP services, as I’ve discussed before.

However, there are a few situations where the plain vanilla HTTP caching model doesn’t quite do the trick. Rather than come up with one-off solultions to our problems, we tried going in the other direction; finding the most general solution that still met our needs, in the hopes of meeting others’ as well. Here are two of them (with specs and implementation).

stale-while-revalidate

The first problem you’ve got when you rely on HTTP caching for performance is simple — what happens when the cache is stale? If fresh responses come in a small number of milliseconds (as they usually do in a well-tuned cache), while stale ones take 200ms or more (as running code often leads to), users will notice (as will your execs).

The naïve solution is to pre-fetch things into cache before the become stale, but this leads to all sorts of problems; deciding when to pre-fetch is a major headache, and if you don’t get it right, you’ll overload your cache, the network or your back-end systems, if not all three.

A more elegant way to do this is to give the cache permission to serve slightly stale content, as long as it refreshes things in the background.

Above, request #1 is served from a fresh cache, as per normal. When the cache becomes stale and stale-while-revalidate is in effect, request #2 will kick off an asyncronous request back to the origin server, while still being served from cache as if it were still fresh (as #3 is, because it’s still inside the stale-while-revalidate “window”). Assuming that the cache is successfully updated, #4 gets served fresh from cache, because that’s what it is now.

So, in a nutshell, stale-while-revalidate hides back-end latency from your clients by taking some liberty with freshness (which you control). See the stale-while-revalidate Internet-Draft for more information.

stale-if-error

The other issue we had was when services go down. In many cases, it’s preferable not to show users a “hard” error, but instead to use slightly stale content, if it’s available. Stale-if-error allows you to do this — again, in a way that’s controllable by you.

For example, Yahoo! Tech has a number of modules on its front page that are sourced from services. If a back-end service has a glitch, in many cases it’s better to show news (for example) that’s a few minutes old, rather than have a blank space on the page. Stale-if-error makes this possible.

Again, see the stale-if-error Internet-Draft for details.

A Word About Cache-Control

People who have looked at these often comment on their requirement for a Cache-Control header; they often just want to be able to configure their cache manually, rather than go around modifying HTTP headers. In fact, we got this request from so many people, we did add this capability in implementation (see below).

That said, my preference is for the Cache-Control extensions, and I always strongly encourage people to use them. Why? Because, while it’s easy for an admin to go into a cache and change things, you then have decoupled the URIs (services) from their metadata; if the services change, it isn’t obvious that some cache configuration somewhere may have to change as well. Additionally, if you have multiple clients caching your data, you then have to go out and remember where all of them are (chances are, you’ll miss one), and configure each. Not good practice.

Status

Both of these extensions are documented and, in my mind, pretty stable; the I-D’s have expired, but AFAICT all I need to do is double-check things, re-submit them and request publication (as Informational RFCs). I’m going to wait a little while to see if anybody has some feedback that I can incorporate.

We also have implementation of both in Squid, coded by Henrik. Currently, there’s a changeset sitting on 2.HEAD, but hopefully it’ll get incorporated in 2.7. Note that that changeset doesn’t have support for the Cache-Control extensions, but only for the squid.conf directives for controlling these mechanisms; when the drafts start progressing, that should change.

The intent here is to make these features available to anyone who wants them; we don’t want to maintain private Squid extensions, and Squid isn’t the only interesting cache in the world. Enjoy, thanks again to Henrik and Yahoo!, and again I’d love any feedback you have.


Filed under: Caching HTTP Protocol Design Standards Web Web Services

16 Comments

Asbjørn Ulsberg said:

Fascinating! Are these RFC drafts presented to the IETF HTTP mailing list?

PS: Roundtripping my name wrongly converts it from UTF-8 to ISO-8859-1.

Wednesday, December 12 2007 at 9:06 PM +10:00

duryodhan said:

or will it be put into the new rewritten HTTP spec ? you said the rewriting involves writing down all the known stuff/accepted practices ...

Wednesday, December 12 2007 at 9:59 PM +10:00

Mark Nottingham said:

They won't become part of RFC2616bis, as they'd be considered 'new features', and are therefore out of scope.

Thursday, December 13 2007 at 6:23 AM +10:00

Ian Bicking said:

re: configuration vs. header -- I've been thinking that more HTTP libraries should emphasize the request object, and use the headers as a kind of API. So, setting stale-while-revalidate is a reasonable option for the HTTP library, but also fits into the HTTP message itself. The same applies to stale-if-error. And I think, though I haven't thought deeply about it, sending that cache control to the library and then possibly over the wire to whatever other intermediate cache's exist, seems ok and maybe good.

Thursday, December 13 2007 at 8:32 AM +10:00

Kevin Burton said:

Sweet..... this is exactly what I need for Tailrank.

Right now we prefetch but as you noted it's prone to problems

The one you DIDN'T mention is that if you have LOTs of URLs you might not KNOW what you need to revalidate.

.... so prefetch doesn't work here.

Thursday, December 13 2007 at 9:28 AM +10:00

Mark Nottingham said:

Ian - good point. I often have discomfort when a library tries to abstract away the message too much, with convenience functions for caching, etc. This just introduces a separate set of concepts from those used in HTTP.

Thursday, December 13 2007 at 10:31 AM +10:00

Henrik said:

The Cache-Control header changeset is now in Squid-2, and all of this will be in the upcoming 2.7 release.

Thursday, December 13 2007 at 11:57 AM +10:00

David Powell said:

IE has implemented a couple of relevant cache-control extensions since IE5, not sure how widely they are used:

http://msdn2.microsoft.com/en-us/library/ms533020.aspx#Use_Cache-Control_Extensions

Friday, December 14 2007 at 12:19 AM +10:00

l.m.orchard said:

One interesting thing that occurs to me is that if you don't prefetch at all for user-facing HTML pages, you could serve up a fake "loading / progress bar" page as the blank-slate result for the first unfortunate users to make the request.

Kick off the async refresh, and leave a sanely timed meta-refresh on the blank-slate page, and hopefully they're happy enough not to complain.

Friday, December 14 2007 at 6:06 AM +10:00

Mark Nottingham said:

WRT IE CC extensions -- I never did really like those; they're specific to browsers, not caches, and their definition is confusing (to put it mildly).

Friday, December 14 2007 at 8:16 AM +10:00

Ben Drees said:

This is good stuff. My colleagues and I have discussed implementing something similar to stale-if-error but have not yet gotten around to it. I look forward to trying out these features when Squid 2.7 becomes available.

As to whether the HTTP extension option or the cache configuration option eases maintenance the most in reverse proxy setups, I think it depends on local circumstances. But the HTTP extension option is the clear winner in performance terms, since it makes it possible for downstream caches (beyond one's administrative capacity to configure) to participate in these schemes.

The Internet-Drafts say that these extensions MAY be implemented by shared caches and that private private caches MUST ignore them, but wouldn't it be semantically equivalent for private caches to implement them also (while further boosting the performance benefits)? I suppose that doing so could heighten the security concern with stale-while-revalidate.

Also - the Internet-Drafts don't say anything about cases where the RFC 2616 rules and these extensions come into conflict. For example, consider a request containing "Cache-Control: max-stale=600" that leads to a 500 error while refreshing a stale response (age=900) containing "Cache-Control: max-age=1, stale-if-error=1200". Presumably the user agent gets the 500, but there is room for concern about correct implementation.

Tuesday, December 18 2007 at 3:24 AM +10:00

Scott said:

Alot of proxies out there already execute the stale-while-revalidate functionality as a "value-add". Any content with explicit expire or max-age directives will continue to be served if caching of expired content is allowed, while the proxy asynchronously checks to see if there is a fresh object. For any objects with low or infrequent request volumes that change regularly, the user(s) caught in the middle get annoyed.

You can't win for losing.

Another option might be "prefetch-while-stale", although I can see that being rife with problems. This would allow a proxy to proactively refresh stale objects with this directive as soon as the object hits its expiration. A proxy admin could then choose to configure the proxy to enable this functionality with the default disabled.

Of course, I can see this directive used with a max-age of 0 or 1 or something silly like that.

Tuesday, July 22 2008 at 5:31 AM +10:00

Mark Nottingham said:

Lots of people have tried prefetching, and the result is usually not worth the effort; occasionally it causes big problems. prefetching while stale generates extra traffic, whereas stale-while-revalidate doesn't.

Tuesday, July 22 2008 at 8:18 AM +10:00

Hoop Somuah said:

Any thoughts on a stale-valid-until header that would indicate how long the 2 headers above should be in effect? I'm thinking about the occasionally connected case where I might want to have stale items used by my app until my device can reconnect but there are limits to that and I'd like the server to be able to indicate that with something like stale-valid-until.

Wednesday, July 23 2008 at 3:13 AM +10:00

Hoop Somuah said:

To clariify my comment above, I'm wondernig about the use of stale content in an offline case and wondering whether that should fall under the revalidate case or if it should be given special consideration.

Wednesday, July 23 2008 at 8:01 AM +10:00

Mark Nottingham said:

HTTP allows caches operating in an offline manner to behave differently; the control given currently is mostly through must-revalidate. It sounds like what you want is closest to stale-if-error which allows you to qualify how long stale content is returned if there is a connectivity problem (or a 5xx response).

Wednesday, July 23 2008 at 9:35 AM +10:00

Creative Commons