mark nottingham

Invalidating Caches with POST

Saturday, 18 February 2006

HTTP Caching

Have you ever posted a comment to a blog, found it missing, so you re-posted it, only to find two entries? Annoying, huh?

Aaron pinged me the other day with this problem, and I responded that the Right way to do this is to POST to the same resource (i.e., the blog entry), so that the POST invalidates the cache.

HTTP has this to say about the matter;

Some HTTP methods MUST cause a cache to invalidate an entity. This is
   either the entity referred to by the Request-URI, or by the Location
   or Content-Location headers (if present). These methods are:

      - PUT
      - DELETE
      - POST

   In order to prevent denial of service attacks, an invalidation based
   on the URI in a Location or Content-Location header MUST only be
   performed if the host part is the same as in the Request-URI.

   A cache that passes through requests for methods it does not
   understand SHOULD invalidate any entities referred to by the
   Request-URI.

I’d forgot that it wasn’t just on the Request-URI, but this makes total sense; each of these situations results in anything that’s been cached to be invalid, and while you can’t guarantee that all caches around the world will invalidate them, implementations should do what they can (especially browser caches, because it’s likely the user will make more requests soon).

As is hopefully obvious from our blog example, this isn’t an uncommon situation; it’s a very useful pattern of use for HTTP.

That’s fine in theory…

As we discussed this, I realised that this was how it’s supposed to work, but considering how legendarily bad some browser cache implementations are, it might not be how it actually works.

Having done some automated browser testing recently, it was easy to whip up a couple of tests for these requirements. I’ve moved all of the caching-related testing into one page; while it uses XMLHttpRequest, these results should be valid for most any implementation, as the same cache as the normal browser should be used.

What are the results so far? Safari seems OK for these purposes (even unknown methods), while Firefox gloriously fails all of the invalidation tests. Unsurprisingly, neither actually caches fresh POST responses, which would be useful in some situations. I’ve filed a bug.

I don’t have IE handy, can someone test it and tell us the results in comments?


12 Comments

Phil Ringnalda said:

For me at least, IE6 hangs hard, about 80% through the progress indicator. It passes the very first test, but scrolling to see more would involve, well, not being hung. I’ll see what IE7 thinks about it when I get home.

Saturday, February 18 2006 at 4:38 AM

Phil Ringnalda said:

IE7b2-preview passes down through “Are fresh GET responses served from cache?” and then fails everything from “Are fresh POST responses served from cache?” to the end.

And while I don’t know about any other browsers, I would lay the odds that HTML caching behavior is exactly the same as XMLHttpRequest caching behavior in Gecko at 50:50. It’s fairly easy to blow caching, and not very obvious when you have, for things outside the main flow, and so we have, often enough.

Saturday, February 18 2006 at 6:09 AM

Phil Ringnalda said:

(One reason I don’t like the party line about testcases always showing failure in red and success in green is that it makes it so easy to not notice useful things about the failure: for PUT, DELETE, and WHATEVER, IE7 is failing with “Error: [object Error]” rather than failing to properly cache-manage.)

Saturday, February 18 2006 at 6:12 AM

mnot said:

The IE6 hang seems to be something specific to the Location-related invalidation cases; I’ve disabled them for the moment, until I can do further testing.

Sunday, February 19 2006 at 4:28 AM

Dan Kubb said:

Hi Mark,

Thanks for your reply. The reason I mentioned this is I came across the following thread in the HTTP Working Group where they talk about varying on the method name:

http://ftp.ics.uci.edu/pub/ietf/http/hypermail/1998q4/0179.html

It seems to say that everyone agrees that the method should be varied upon, but that due to time constraints RFC 2616 wasn’t going to be updated and that it was “just clarifying what the implementer should be able to figure out on their own” (Roy Fielding).

Based on this thread and everything I’ve read in RFC 2616, requests for POST, PUT, and DELETE should write-through to the origin server by default, but this behaviour can be over-ridden if the origin server returns a Cache-Control header specifying that intermediaries can cache the response and return it if the method/request URI/variant match in future requests.

It seems that most implementations of caches don’t take into account more than the request URI and variants when caching, but I’m not sure if thats just because of intertia or if it was the original intention of the HTTP WG.

Sunday, February 19 2006 at 4:30 AM

mnot said:

Hi Dan,

I thought about this a bit more after our last exchange, and I think we read each others’ minds :)

This issue has come up before in different contexts, and many people (including some of the authors of HTTP, IIRC) vehemently state that you don’t treat different methods as variants.

HOWEVER, I think you could use the extensibility of HTTP to do this, either by coming up with a cache-control extension method that overrides the HTTP caching model, or (perhaps more cleanly) defines a new, method-specific cache with its own model.

E.g., say that Cache-Control: options-max-age=300 allows you to cache OPTIONS responses in their own cache for 300 seconds, if you know about this extension (implementations that don’t will still do the right thing).

So, I think we’re both right; within the confines of HTTP/1.1, as defined by RFC2616, the caching model doesn’t allow you to vary on the request method. However, you could extend it to do so (IMO) — if you can get anybody to implement such a thing.

The real question is whether you should cache anything except GET. This part of that thread was illuminating;

[[[ Anyway, the reason why I don’t think this is a good idea is that if a response is fully cachable regardless of whether the method is understood or not, then it smell, feels, and looks like a GET request. There is absolutely no reason to have several GET-alike methods. GET is special because it is special because it is special.]]]

I tend to agree with this view; you can give anything a URI, and then you can GET it. The only cases where I waver are a) site-wide metadata (which seems well-suited for OPTIONS) and b) properties on a filestore (while you can say that property URIs can be made by appending a string onto normal URIs, etc, managing collisions becomes problematic, so we have PROPFIND).

If you’d like to pursue an avenue like this, I’d suggest talking to the WebDAV folks; there’s been a long-standing wishlist item to get PROPFIND caching specified and implemented there, which is very much along these lines.

Cheers,

Sunday, February 19 2006 at 5:21 AM

mnot said:

Phil -

WRT HTML vs. XMLHttpRequest, it would be interesting to see. I did do some testing on Firefox regarding these issues manually (i.e., with HTML and CGI, not XMLHttpRequest), and got the same results.

Cheers,

Sunday, February 19 2006 at 5:24 AM

Dan Kubb said:

I had a question about something you mention on your caching related testing page: Are you sure that POST responses that include Cache-Control or Expires the response can be used to satisfy future GET requests?

I thought caching was method specific for a given request URI. (while also taking into account the matching headers in Vary)

By that I mean that when responses to various methods are cached, they are used to satisfy future requests for the same method only.

Take for example an OPTIONS response. If the response to it includes a Cache-Control or Expires header the intermediary can cache it and return it for future OPTIONS requests without forwarding the request on to the server. Granted that most OPTIONS responses do not include an entity body, but they can according to RFC 2616, sec 9.2.

It wouldn’t make sense to return the cached response from the OPTIONS request to satisfy future GET requests, so why should it be different in the case of POST, PUT, DELETE, etc requests?

Sunday, February 19 2006 at 7:50 AM

Thomas Broyer said:

FYI, Opera 8.5 and 8.52 fail on every test from -and including- “Are 304 Not Modified responses handled by the implementation?”.

(Nota: it seems MT doesn’t handle UTF-8 correctly when posting comments; or is it Firefox? Well, I’ve replaced the garbage with us-ascii only characters…)

Monday, February 20 2006 at 5:43 AM

Jim said:

the Right way to do this is to POST to the same resource

Surely the right way to do this is to set must-revalidate for the cached resource?

Wednesday, February 22 2006 at 4:03 AM

mnot said:

Jim,

Not necessarily; if you do that, the cache will need to do a validation check on every request. While that kind of tight model is needed for some apps (when ALL caches must stay consistent), sometimes (e.g., when a particular person is editing a resource), it’s OK just to selectively invalidate some caches – e.g., the ones near that user, so that they don’t get confused.

Wednesday, February 22 2006 at 4:10 AM