mark nottingham

Chrome and Stale-While-Revalidate

Sunday, 1 June 2014

HTTP Caching

Chrome is looking at adding support for RFC5861’s stale-while-revalidate, which is really cool. I wrote about the details of SwR when it first became an RFC, but its application to browsers is something that’s a new. Seems like a good time to answer a few potential questions.

What’s stale-while-revalidate For?

I originally designed SwR when I was at Yahoo!, to help avoid latency when accessing our back-end HTTP APIs. Since assembling a page could mean access somewhere between 50 and 100 such APIs, a validation delay on any one of them could cause the whole page to be slow.

stale-while-revalidate addresses that by allowing a response to be used once it becomes stale, triggering a background refresh; the latency of the refresh is hidden from the user.

For example, we might have used something like:

Cache-Control: max-age=60, stale-while-revalidate=15

Here, the response (let’s say it’s a sports score) is fresh for sixty seconds, and for an additional 15 seconds, the stale response can be used. This worked really well, because we were putting all Yahoo! traffic for the site (which is a lot) through just a few caches, so the likelihood of another request during that fifteen second window was very high; in this manner, “hot” responses were almost always able to be served directly from the cache, whilst still being kept up-to-date.

Why stale-while-revalidate in Browsers?

Browser caches are very different to this; rather than funnelling a lot of similar traffic through one place like the “reverse proxy” that we were using, or even a lot of potentially similar traffic, like a normal “shared” proxy cache, a browser cache is just for one user. As such, it’s much less likely that there will be a request in the “stale” period to trigger the revalidation when the window is so small.

However, resources cached by browsers tend to be different than those used by back-end APIs; things like JavaScript, HTML, images and CSS tend to have longer freshness lifetimes. For example, you might have an image “thing.jpg” referenced from your page that has:

Cache-Control: max-age=2592000

That’s a 30-day freshness period. Let’s imagine that the user comes to your site fairly often, so during that 30-day period they’ll get good page load performance. As soon as that 30-day window has passed, however, they’ll see a page load delay, because the browser needs to revalidate the image. If instead you use:

Cache-Control: max-age=864000, stale-while-revalidate=1728000

…then the image will be refreshed every ten days. Browsers that implement stale-while-revalidate will also hide the refresh latency on subsequent visits, provided that they’re within 30 days of the response being stored in the cache.

If you don’t want to create extra traffic on your site, on the other hand, you could do:

Cache-Control: max-age=2592000, stale-while-revalidate=864000

This means that browsers that don’t support stale-while-revalidate will behave just as they did before, while those that do will have a ten-day grace period after becoming stale, where refresh latency will be hidden from end users.

As such, SwR allows you to trade off three things:

in a much more fine-grained manner than vanilla HTTP caching.

There’s a great discussion on the chromium-dev thread started by Kenji Baheux with more use cases, along with suggested header values.

Why Not Just Use Long Freshness Lifetimes?

It’s become industry best practice to give static responses long freshness lifetimes, changing the URL when the payload changes. While this is a great technique for some kinds of content (e.g., JavaScript libraries), it doesn’t work so well for content that does change, but you can’t change the URL. This is where SwR can help.

For example, HTML pages, third-party widgets, Web fonts and similar content all need regular updates, but can’t change there URLs. Using SwR well means that they’ll have a better chance of being both up-to-date and well-performing.

Why Not Just Prefetch?

A common approach to the problem that stale-while-revalidate solves is to use a form of prefetching; i.e., proactively refreshing the response when it becomes stale (or right beforehand). The problem with prefetching is that it’s not tied to user behaviour; if you always refresh things once they become stale whether or not someone is actually looking at them, you’re creating needless network traffic and server load, and making it harder for the cache eviction algorithm to do its job well. SwR avoids this by explicitly giving permission for the response to be used stale while it’s being refreshed, but only if it’s accessed. In this manner, the refresh traffic is proportional to user demand.

Isn’t This Just IE’s post-check?

Internet Explorer has an oft-maligned feature called post-check, which is tied at the hip to its complement, pre-check.

Conceptually, it is indeed very similar to stale-while-revalidate, but most developers I talk to find its configuration quite confusing; in fact, it’s so confusing that Eric Lawrence wrote a whole blog entry about how they’re misconfigured. In my opinion, stale-while-revalidate is much simpler and less prone to misconfiguration. It’s also an RFC, of course.

What Else Can it be Used For?

When Chrome went public with this, my colleague Guy Podjarny asked whether a JavaScript API would be made available for notifying apps of background updates. This is really intriguing to me, since it’d enable applications using HTTP/2 to replace “long polling” with Server Push; the cache effectively becomes the browser-side buffer for events (I wrote more about this in this Internet-Draft towards the end).

At first glance, this seems like it’s a job for ServiceWorkers, since that’s effectively a built-in proxy/cache in the browser. After some thought, however, I agree with Guy; it’s more useful to define a separate API for interacting with the HTTP cache. After all, while ServiceWorker has some amazing and very attractive capabilities, HTTP caching is still going to be used by the bulk of the Web, and we shouldn’t require people to swallow all of ServiceWorker to get better HTTP caching.

More on this soon, I think.