Tuesday, 7 August 2007
ETags, ETags, ETags
I’ve been hoping to avoid this, but ETags seem to be popping up more and more often recently. For whatever reason, people latch onto them as a litmus test for RESTfulness, as the defining factor of HTTP’s caching model, and much more.
So, let me counter: they’re not all that. In fact, there are a number of pitfalls you need to be wary of if you use them.
First, depending on how they’re generated, you might find different boxes in a farm producing different ETags, with unfavourable results for caching.
Or, you might find that your implementation doesn’t really understand HTTP very well, so it gives the same ETag to two different representations of the same resource, causing downstream caches to bend over backwards to accommodate your broken server. It can happen to the best of us.
If you’re trying to be a good guy and both compressing your content and hashing it to calculate the ETag, beware; the gzip file format has a timestamp in it that means the ETag will change every time you re-compress it. Oops.
Even if you get the whole ETag thing right, there’s no guarantee that a cache will use it; although recent versions of Squid understand and use ETags, lots of older implementations don’t.
Another mistake is to think that ETags are only used for caching. If you hand out ETags and your resource supports methods like PUT or DELETE, you’d better be ready to properly handle conditional headers like If-Match on requests; otherwise, they’ll end up doing the wrong thing. I’m heartened somewhat that the APP spec alludes to this in passing, but I do wonder how many people have got this wrong (thanks to Lisa for bringing this up in discussion). Then there’s the whole ETag-on-write issue.
Finally, there’s the whole mess of weak ETags; although they’re potentially very powerful, they’re also very misunderstood.
All of this is not to say that ETags are useless; far from it. However, I do get confused and concerned when people seem to focus on just one feature of HTTP to the exclusion of other, just as (or more) appropriate ones. Dare I say “cargo cult”?
While ETags are a fine validation mechanism, Last-Modified is also perfectly fine in many situations. Even better, avoid the round trip altogether and give your response some freshness information with Cache-Control: max-age.
So What’s Right With ETags?
Now that I’ve had my rant, there are some good things about ETags. If you need a strong validator (i.e., your response might change more than once a second), they can’t be beat, and if you don’t like how Last-Modified is used as input to freshness heuristics, it’s a fine alternative, as long as you keep the caveats above in mind.
Of course, if you need to do optimistic concurrency, they're a great option.
IIRC Yves has done some very cool things with weak ETags in Jigsaw, so that small changes don’t upset caches.
Finally, Tim Bray also has some very intelligent things to say about them, pointing out that if you’re clever, you can use an ETag to avoid a bunch of work on the server side during validation. Unfortunately, I don’t see too many people doing this yet.