Thursday, 11 May 2006
The State of Browser Caching
One of the big problems that Web developers have with HTTP caching is that they don’t know how the caches behave; while the specs say one thing, the actual behaviour of the cache often significantly deviates — usually because the cache’s developer or operator thinks they can do better.
The easiest way to overcome this obstacle is to measure the behaviour of the caches. In particular, thanks to XmlHttpRequest, it’s fairly easy to test a browser’s cache by just hitting a Web site under controlled conditions, in the same manner as discussed earlier.
The only caveat to this approach is that it’s using XHR, not normal HTML. To check on that, I tested a number of features by hand with the browsers, and didn’t find any variance; it looks like all of the browsers put XHR requests through the same cache as “normal” requests. Of course, It’s also entirely possible I’ve made errors in my tests, and I’d be grateful for any corrections.
I tested the “big four” browsers:
- Firefox 1.5.2
- Internet Explorer 6.0 (XP SP2)
- Safari 2.0.3
- Opera 8.54
There are, of course, many more browsers out there, and many other versions of these. If you test another, please summarise the results in comments below; ideally, I’d like to get coverage of everything C-grade and higher in the YDN browser grades table.
The current crop of browser caches isn’t too bad, with a few notable exceptions.
None of the browsers sends a cache-busting Cache-Control request header, although Firefox does when you reload a page; that’s probably a reasonable thing to do (although not ideal).
All of the browsers properly handled validation based on If-Modified-Since, and interestingly all of them except Safari appear to support validation using ETags and If-None-Match.
All of them will cache responses that have freshness information (i.e.,
Cache-Control: max-age headers). Likewise, all of them know that they’re private caches, and therefore will still cache something with
Cache-Control: private. That’s very handy, because it allows you to target the browser cache separately from intermediary caches.
All of them will cache responses from URIs that contain question marks, as long as there’s freshness information present. IE is a little bit aggressive with them; it will cache even without freshness information. That isn’t necessarily bad, just something to be aware of.
All of the browsers appear to use a freshness heuristic based on
Last-Modified if there isn’t any explicit freshness information available. If you want to avoid this, they all pay attention to
None of them will cache POST responses for use with future GETs; this isn’t too surprising, as it’s a little-understood features of the HTTP caching model, but it would be nice to have.
A much worse problem is their handling of side effect invalidation; as discussed before, only Safari correctly invalidates the cache upon a non-GET method. This severely limits the ability of a Web application to control the browser cache; once it’s in there, it only gets out when it becomes stale and gets validated. See the Mozilla bug for this.
Additionally, while Mozilla will cache content-negotiated responses, IE will not, except for those negotiated for
Content-Encoding. However, both Safari and Opera will use a negotiated response even when the request headers don’t match those that are cached. Generally, this won’t be a problem unless you’re hand-negotiating responses in XmlHttpRequest. Still, it makes sense to avoid conneg on anything except
Accept-Encoding, because of IE’s behaviour.
Note that if you’re trying to avoid this bug, the solution is not to remove the Vary header; that will break proxy caches, which will send the wrong content to other browsers.
Finally, all browsers except Mozilla will favour
Cache-Control: max-age if it conflicts with Expires, which is the right thing to do (so that it’s possible to tell less capable caches to not cache something when you’re doing something fancy with more advanced directives). See the mozilla bug. Also, Opera will cache something with a fresh
Expires header, even if it has a
Cache-Control: no-store as well.
There are a few other gotchas if you’re using XmlHttpRequest. None of the caches honours Cache-Control request headers set by the author, which makes controlling the cache very difficult. Opera surfaces a 304 Not Modified to the author, rather than silently replacing it from cache. There are also a number of non-cache-related XHR problems, covered in separate tests.
So there you have it. If there’s some other aspect of the caching model that you’re interested in, tell me and I’ll try to test for it. I’m also considering doing more fine-grained testing (e.g., for a variety of max-ages, varying across different headers, and using different filename extensions and media types, which often figure heavily into caching heuristics).
In the meantime, I hope these results give Web authors a little bit more confidence in browser caches.