mnot’s blog

Design depends largely on constraints.” — Charles Eames

Friday, 27 May 2011

Caching HTTP Web

Linked Cache Invalidation

After designing and deploying Cache Channels, it quickly became apparent that one Web cache invalidation mechanism wasn’t able to cover the breadth of use cases.

In a nutshell, Cache Channels trades off immediacy for reliability; that is, while cache invalidations don’t take place right away (there’s a 10-30 second window), you know that they’ll be respected, because of how the protocol is designed.

That’s great if you need to (for example) invalidate news articles with months of TTL because the legal department is freaking out, but not so good if you want your users’ changes to be reflected on the pages they’re looking at, while still keeping your cache efficient.

Linked Cache Invalidation is another approach, with different proerties. Briefly, it allows you to declare the relationships between resources, so that when one changes (because of a person POSTing a blog comment, for example), the cache knows enough to invalidate the related resources. Not perfectly reliable, but great when you’re working with certain kinds of content.

So, a couple of years ago I coded this up at Yahoo! and eventually got our Squid implementation Open Sourced, both as a few patches (now sitting on Squid 2.7 HEAD) and as a “ helper process” to do the behind-the-scenes accounting and invalidation.

Then, a funny thing happened. I was on the programme committee of WS-REST 2010, where Mike Kelly and Michael Hausenblas had submitted a paper called “ Using HTTP Link: Header for Gateway Cache Invalidation.” Needless to say, I had a bit of a chuckle and started talking to Mike K. Essentially, they’d created the same system; great minds think alike.

I’m blogging about this now because, finally, Mike and I have submitted an Internet-Draft to register the link relations and explain how it works. I would note that the inv-maxage Cache-Control header is not implemented in Squid yet, so that implementation will only work with gateway caches; it can’t be relied to work with proxy caches.

If you caught my “ Stupid Caching Tricks” at Velocity last year, I mentioned LCI near the end. It has has now been in production in a few parts of Yahoo! for a while now, and the feedback is pretty positive. While it isn’t the last word in invalidation systems, I think it’s a really nice balance. Check out the draft for more details.


Pete Johanson said:

Using this approach, how does one invalidate responses to resources that vary by query parameters?

Assume we have a /dogs resource, which possibly is filter-able using a URI Template like “/dogs{?breed}”. An example fully formed URL would then be “/dogs?breed=lab”.

Imagine the following scenario:

POST /dogs name=Bob&breed=lab

GET /dogs?breed=lab Accept: application/json

PUT /dogs/123 name=Bob&breed=bulldog

; rel=”invalidates”, ???

The last part is the on that’s stumping me… How do we specify that the resource w/ that particular query parameter is now invalid? For a single query parameter, this may be feasible, but for multiple parameters, the combinations lead to this being impossible to handle reasonably.

What is the “Right Way (tm)” to handle this scenario?

Wednesday, May 16 2012 at 1:27 AM

Pete Johanson said:


Thanks for the response. Your link URIs got eaten, but the idea of synthetic URIs linked with rel=”invalidated-by” seems like a good solution.

My primary client for the service is a thick client w/ a UI to allow filtering the resource on several axes, I’ve not found a better solution than the one of several query parameters.

Instead of using a purely synthetic URI for the ‘invalidated-by’, is there any problem with this:

GET /dogs?breed=lab

OK Link: </dogs>; rel=”invalidated-by”

And then I can simply invalidate “/dogs” in my responses? That would invalidate the whole collection resource “/dogs”, as well as all fitlerings of it, AFAICT.

Wednesday, May 16 2012 at 10:34 AM