mnot’s blog

Design depends largely on constraints.” — Charles Eames

Thursday, 11 May 2006

The State of Browser Caching

Updated 2006-06-03

One of the big problems that Web developers have with HTTP caching is that they don’t know how the caches behave; while the specs say one thing, the actual behaviour of the cache often significantly deviates — usually because the cache’s developer or operator thinks they can do better.

The easiest way to overcome this obstacle is to measure the behaviour of the caches. In particular, thanks to XmlHttpRequest, it’s fairly easy to test a browser’s cache by just hitting a Web site under controlled conditions, in the same manner as discussed earlier.

The only caveat to this approach is that it’s using XHR, not normal HTML. To check on that, I tested a number of features by hand with the browsers, and didn’t find any variance; it looks like all of the browsers put XHR requests through the same cache as “normal” requests. Of course, It’s also entirely possible I’ve made errors in my tests, and I’d be grateful for any corrections.

I tested the “big four” browsers:

There are, of course, many more browsers out there, and many other versions of these. If you test another, please summarise the results in comments below; ideally, I’d like to get coverage of everything C-grade and higher in the YDN browser grades table.

The Good

The current crop of browser caches isn’t too bad, with a few notable exceptions.

None of the browsers sends a cache-busting Cache-Control request header, although Firefox does when you reload a page; that’s probably a reasonable thing to do (although not ideal).

All of the browsers properly handled validation based on If-Modified-Since, and interestingly all of them except Safari appear to support validation using ETags and If-None-Match.

All of them will cache responses that have freshness information (i.e., Expires and Cache-Control: max-age headers). Likewise, all of them know that they’re private caches, and therefore will still cache something with Cache-Control: private. That’s very handy, because it allows you to target the browser cache separately from intermediary caches.

All of them will cache responses from URIs that contain question marks, as long as there’s freshness information present. IE is a little bit aggressive with them; it will cache even without freshness information. That isn’t necessarily bad, just something to be aware of.

All of the browsers appear to use a freshness heuristic based on Last-Modified if there isn't any explicit freshness information available. If you want to avoid this, they all pay attention to Cache-Control: no-cache.

The Bad

None of them will cache POST responses for use with future GETs; this isn’t too surprising, as it’s a little-understood features of the HTTP caching model, but it would be nice to have.

A much worse problem is their handling of side effect invalidation; as discussed before, only Safari correctly invalidates the cache upon a non-GET method. This severely limits the ability of a Web application to control the browser cache; once it’s in there, it only gets out when it becomes stale and gets validated. See the Mozilla bug for this.

Additionally, while Mozilla will cache content-negotiated responses, IE will not, except for those negotiated for Content-Encoding. However, both Safari and Opera will use a negotiated response even when the request headers don’t match those that are cached. Generally, this won’t be a problem unless you’re hand-negotiating responses in XmlHttpRequest. Still, it makes sense to avoid conneg on anything except Accept-Encoding, because of IE’s behaviour.

Note that if you’re trying to avoid this bug, the solution is not to remove the Vary header; that will break proxy caches, which will send the wrong content to other browsers.

Finally, all browsers except Mozilla will favour Cache-Control: max-age if it conflicts with Expires, which is the right thing to do (so that it’s possible to tell less capable caches to not cache something when you’re doing something fancy with more advanced directives). See the mozilla bug. Also, Opera will cache something with a fresh Expires header, even if it has a Cache-Control: no-store as well.

The Buggy

There are a few other gotchas if you’re using XmlHttpRequest. None of the caches honours Cache-Control request headers set by the author, which makes controlling the cache very difficult. Opera surfaces a 304 Not Modified to the author, rather than silently replacing it from cache. There are also a number of non-cache-related XHR problems, covered in separate tests.

So there you have it. If there’s some other aspect of the caching model that you’re interested in, tell me and I’ll try to test for it. I’m also considering doing more fine-grained testing (e.g., for a variety of max-ages, varying across different headers, and using different filename extensions and media types, which often figure heavily into caching heuristics).

In the meantime, I hope these results give Web authors a little bit more confidence in browser caches.


Filed under: Caching Web

22 Comments

Dimitri Glazkov said:

Excellent article. Thanks for the research. Can you explain a little bit better what you mean by "request headers set by the author"? I lost you there.

Friday, May 12 2006 at 6:58 AM +10:00

Mark Nottingham said:

Dimitri,

For example, if you call setRequestHeader("Cache-Control", "no-cache") in your JavaScript, the browsers won't honour it; they'll use it if it's cached.

Friday, May 12 2006 at 12:03 PM +10:00

James Antill said:

Additionally, while Mozilla will cache content-negotiated responses, IE will not (which is OK; it's just not taking full advantage). However, both Safari and Opera will use a negotiated response even when the request headers don't match those that are cached. As a result, it's best to avoid using conneg.


Did you test this with real HTML, I'm pretty shocked that IE won't cache anything with a Vary header (_maybe_ not as much if it worked for content-language and not for content-type, but I'd assume if you support one, doing both is easy). It's much more understandable if this is XHR only.

Safari and Opera having broken caching with conneg is sad, but more understandable ... 99% of the time the browsers send the same headers out, and it's pretty unlikely that a browser is going to request the same resource with different Accept headers (CSS, images and everything else).

Friday, May 12 2006 at 12:54 PM +10:00

Mark Nottingham said:

I tested with XHR, and double-checked by hand (varying on Accept-Language and tuning browser prefs). IE won't cache anything with a Vary header. I'm not surprised by this; supporting cached variants is a relatively new thing; a few years, almost no one (browser or intermediary) did it.

I agree that the common case is that the browser will be sending the same headers, but relying on that behaviour is short-sighted, as XHR proves.

Friday, May 12 2006 at 1:14 PM +10:00

Julian Reschke said:

Hi. AFAIK, the only thing IE accepts in "Vary" is "User-Agent".

BTW: it's even worse. While it may be acceptable that IE does not take advantage of the cached response, it utterly fails if the response uses a media type that requires an external application, such as PDF. In that case, IE decides not to store the content in the cache, but then invokes the external application with the file name of the cache object it didn't create.

See <http://support.microsoft.com/?kbid=824847>

That's my favorite one.

Friday, May 12 2006 at 4:29 PM +10:00

Henri Sivonen said:

It is certainly nice if Safari now works. A bit over a year ago on Panther Safari did some serious over-caching of XHR GETs. At that time, the only way I got around it was by appending a random query string to the URI each time to salt it. (The query string was ignored on the server.) And yes, my server-side Cache-Control was sane and worked fine with Firefox and IE.

Saturday, May 13 2006 at 12:57 AM +10:00

James Antill said:

but, but, but... not even "Vary: Accept-Encoding"? Please tell me it's not that bad.

Saturday, May 13 2006 at 1:48 PM +10:00

quai said:

Hi,

Do anybody have any idea how the browser behaviour when, let say page 1 is downloading a big content (e.g a big big image file), before it is downloaded successfully, the user just browse to another page, for say page 2. Will that big content continued to be downloaded and cached? Is there any different behaviour for the above situation for different browser? Thanks in advance for any advice. :)

Sunday, May 21 2006 at 10:42 PM +10:00

Mark Nottingham said:

James -

Yep, it's that bad. I tested Accept-Encoding, and it doesn't cache. I've heard what Julian mentioned about UA from multiple sources, but haven't confirmed yet.

See also:
http://support.microsoft.com/default.aspx?scid=kb;en-us;327286

AFAICT, IE is using the cache as temp space, and when it thinks something is uncacheable, it doesn't have a temp file.

Monday, May 22 2006 at 3:59 AM +10:00

Mark Nottingham said:

I've just updated the entry. Changes include:

- I tested Varying across Accept-Encoding more closely, and it turns out recent versions (post-SP2?) of IE *do* seem to cache compressed content. This appears to be a hack they put in just for compression, because it's very specific to it; e.g., if you omit the Content-Encoding header, it won't cache, etc.

- Added information about Expires vs. Cache-Control precedence

- Refined the heuristic caching tests

I'm still looking at the variant caching issues, especially trying to find out when MSFT patched this, because it was so widely reported before.

Saturday, June 3 2006 at 1:09 PM +10:00

Andy Davies said:

You need to take a wider view than just browser caching...

Back in 2000, I was part of team developing some web applications that made requests via http, we kept having an odd problem running on one corporate network which was traced down to the Netscape proxy server.

When the application requested something that didn't exist, the next request would also come back as a 404, instead of return the expected content.

People tend to forget about proxies as lots of developers work outside environments where they're used.

Andy

Monday, June 5 2006 at 3:49 AM +10:00

Mark Nottingham said:

One step at a time. If you want to do intermediary testing, check out http://coad.measurement-factory.com/

Monday, June 5 2006 at 7:38 AM +10:00

EricLaw said:

IE6's implementation was a bit hacky. What happens is that if the Content-Encoding header is present, the file is stored in the cache and passed from WinINET up to URLMon, which decompresses it and puts it back in the cache. Hence, you get the file despite the Vary header.

In IE7, this no longer happens, and decompression happens inside WinINET. This fixes many bugs, but we've also made significant improvements to Vary support in IE7. The implementation still isn't perfect because the IE7 cache doesn't cache the request headers needed to fully implement Vary. With the improvements, you're much more likely to get a conditional HTTP request the second time instead of getting an unconditional one like you'd get in IE6.

I'm curious about one thing above: Does the RFC really permit a cached POST response to be returned for a different method (e.g. a subsequent GET)? I've never seen that happen; it seems like it might prove to be a dangerous optimization.

Thanks,
ericlaw@microsoft

Wednesday, August 23 2006 at 5:12 PM +10:00

Franklin PIAT said:

Regarding "cache POST responses", I think it wouldn't be a good idea to cache "post" for future "get" queries, because :

1. Posted pages often have a message to acknowledge that post was successfull. It wouldn't make sense to cache that.

2. How would the browser now that the same uri should be served (your are likely to have more argument in the post than in the subsequent GET : different arguments means different uri)

I often address the problem you describe by replying to a POST with a _short_ acknowledgement, then do a "location replace" with a cachable GET (unless posted data are incorrect, in which case i simply display the form again to the user).

Thanks,

Franklin

Sunday, September 10 2006 at 4:05 AM +10:00

Franklin PIAT said:

.. Obviously i won't teach you much on this... that's how this blog system works!

Sunday, September 10 2006 at 4:15 AM +10:00

Steve Clay said:

Re: "All of the browsers properly handled validation based on If-Modified-Since", in what circumstances does Safari send If-Modified-Since? I'm trying to handle this manually testing one PHP page and I can't get Safari to cache anything. IE7 and FF send it with the refresh button, Opera sends it only when you enter in the address field...

Thursday, October 12 2006 at 10:22 PM +10:00

HR said:

Also, it is worth noting that Firefox will treat any document that has "Vary: Cookie" (or any combination of varies that includes "Cookie") as if it had "Cache-control: must-revalidate". (tested on Firefox 2.0)

Monday, November 13 2006 at 5:07 PM +10:00

joe said:

Hi, I´ve read all your articles about caching and implemented expires_module on apache so I can cache images. But I have a stupid doubt, will this work with ssl and behind http basic auth ?
Thanks for your excelent work :)

Wednesday, February 28 2007 at 10:05 PM +10:00

Robert Siemer said:

You consider a cached POST for future GETs. - Why not a cached POST for a future similar POSTs?

The standard does not generally say one methods response is valid for some other, as you probably will agree on for POST after GET...

And in HTTP/1.1 there is no special mentioning of POST responses being considered for GET requests.

And what would a Cache-Control header mean in a POST response if the answer is not a 200 OK? Like redirects and "Created".

Another issue: on the test page you mark "not cached" as failed sometimes. In the HTTP sense, caching is not an obligation. - I understand failed as something that Firefox 2.0.0.3 does: delivering from the cache pages that say "Cache-Control: max-age=0".

Regards, Robert

Sunday, May 20 2007 at 11:41 PM +10:00

dave cheseldine said:

"I'm pretty shocked that IE won't cache anything with a Vary header"

IE6 seemed to cache a file with a
Vary: negotiate
response header. Have I missed something?

Monday, October 15 2007 at 5:41 AM +10:00

mmj said:

From my current testing it appears that Firefox doesn't handle the Vary: header correctly after a redirect.

It should be possible to use Vary: Cookie to indicate to the cache that if the cookies that would be sent change, it should revalidate.

However, upon landing at the destination page of a redirect, Firefox appears to be ignoring the previously received Vary: header of that page and displaying the wrong version of the page without contacting the server. Further attempts to 'reload' the page do not cause revalidation; it now continues to fetch from cache as if it has forgotten its old "Vary" header. A Ctrl-refresh fixes it.

Tuesday, November 18 2008 at 10:52 PM +10:00

mmj said:

Update to my above comment: this issue *may* only happen when accessing a site locally (ie, localhost).

Similarly, a lot of other caching behaviour breaks down in Firefox when acessing localhost, such as If-None-Match and Etag, which start 'forgetting' the etag on every second request, or last-modified, which sometimes causes everything to be cached regardless of freshness.

It seems that debugging a caching issue is going to be a pain in the neck if behaviour on localhost is totally different to on any other host.

At any rate, this needs further testing.

Here's someone else who may have a similar problem and more description:
http://www.experts-exchange.com/Software/Server_Software/Web_Servers/Microsoft_IIS/Q_23851937.html

Wednesday, November 19 2008 at 12:42 AM +10:00

Creative Commons