Linked Cache Invalidation

Friday, 27 May 2011

After designing and deploying Cache Channels, it quickly became apparent that one Web cache invalidation mechanism wasn’t able to cover the breadth of use cases.

In a nutshell, Cache Channels trades off immediacy for reliability; that is, while cache invalidations don’t take place right away (there’s a 10-30 second window), you know that they’ll be respected, because of how the protocol is designed.

That’s great if you need to (for example) invalidate news articles with months of TTL because the legal department is freaking out, but not so good if you want your users’ changes to be reflected on the pages they’re looking at, while still keeping your cache efficient.

Linked Cache Invalidation is another approach, with different proerties. Briefly, it allows you to declare the relationships between resources, so that when one changes (because of a person POSTing a blog comment, for example), the cache knows enough to invalidate the related resources. Not perfectly reliable, but great when you’re working with certain kinds of content.

So, a couple of years ago I coded this up at Yahoo! and eventually got our Squid implementation Open Sourced, both as a few patches (now sitting on Squid 2.7 HEAD) and as a “ helper process” to do the behind-the-scenes accounting and invalidation.

Then, a funny thing happened. I was on the programme committee of WS-REST 2010, where Mike Kelly and Michael Hausenblas had submitted a paper called “ Using HTTP Link: Header for Gateway Cache Invalidation.” Needless to say, I had a bit of a chuckle and started talking to Mike K. Essentially, they’d created the same system; great minds think alike.

I’m blogging about this now because, finally, Mike and I have submitted an Internet-Draft to register the link relations and explain how it works. I would note that the inv-maxage Cache-Control header is not implemented in Squid yet, so that implementation will only work with gateway caches; it can’t be relied to work with proxy caches.

If you caught my “ Stupid Caching Tricks” at Velocity last year, I mentioned LCI near the end. It has has now been in production in a few parts of Yahoo! for a while now, and the feedback is pretty positive. While it isn’t the last word in invalidation systems, I think it’s a really nice balance. Check out the draft for more details.

4 Comments

Pete Johanson said:

Using this approach, how does one invalidate responses to resources that vary by query parameters?

Assume we have a /dogs resource, which possibly is filter-able using a URI Template like “/dogs{?breed}”. An example fully formed URL would then be “/dogs?breed=lab”.

Imagine the following scenario:

POST /dogs name=Bob&breed=lab

GET /dogs?breed=lab Accept: application/json

PUT /dogs/123 name=Bob&breed=bulldog

; rel=”invalidates”, ???

The last part is the on that’s stumping me… How do we specify that the resource w/ that particular query parameter is now invalid? For a single query parameter, this may be feasible, but for multiple parameters, the combinations lead to this being impossible to handle reasonably.

What is the “Right Way (tm)” to handle this scenario?

Wednesday, May 16 2012 at 1:27 AM

Mark Nottingham said:

Hi Pete,

You can reduce the combinations that you have to enumerate by canonicalising the URIs beforehand; e.g., see https://github.com/mnot/squid-director

Having said that, using query parameters to filter responses is going to lead to a large number of combinations, which not only makes this hard, but also lessens cache efficiency.

Do you really need to allow so many slices into the data set?

One way to manage this is to use “synthetic” URIs; e.g., to put on the “breeds’ responses something like:

Link: ; rel=”invalidated-by”

to give it another name, effectively, and then invalidating that URI in the response:

Link: ; rel=”invalidates”

Make sense?

Wednesday, May 16 2012 at 1:45 AM

Pete Johanson said:

Mark,

Thanks for the response. Your link URIs got eaten, but the idea of synthetic URIs linked with rel=”invalidated-by” seems like a good solution.

My primary client for the service is a thick client w/ a UI to allow filtering the resource on several axes, I’ve not found a better solution than the one of several query parameters.

Instead of using a purely synthetic URI for the ‘invalidated-by’, is there any problem with this:

GET /dogs?breed=lab

OK Link: </dogs>; rel=”invalidated-by”

And then I can simply invalidate “/dogs” in my responses? That would invalidate the whole collection resource “/dogs”, as well as all fitlerings of it, AFAICT.

Wednesday, May 16 2012 at 10:34 AM

Mark Nottingham said:

Yes. I’d suggested /dogs?breed=lab, but you can slice it up how you like.

Thursday, May 17 2012 at 10:34 AM

mark nottingham

other HTTP Caching posts