mnot’s blog

Design depends largely on constraints.” — Charles Eames

Saturday, 28 June 2003

Caching HTTP Web

Caching is often enough

I feel compelled to respond to Norm Walsh’s thoughts on caching.

It’s important to distinguish between the capabilities of a specific product (such as WWWoffle) and the technology that it implements (caching). I would agree that the general state of cache implementation leaves much to be desired, both in clients and in proxies. However, I would not write off the technology wholesale.

For example, Norm claims that you can’t populate a cache. Untrue; it’s simple to populate any cache, simply by driving a workload through it. If that’s too simplistic, I know of at least one commercial proxy cache which allows you to push content into it in a efficient and secure manner, using nothing but HTTP (I know this because I forced them to do it, when I had the weight of a Fortune 25 company behind me; handy thing, that ;)

A stickier problem is “pinning” representations in the cache, so that they don’t get evicted by others. This is less common (most vendors will tell you to just buy more disk, which I don’t consider acceptable in some situations), but I know of one implementation that does allow this.

Norm would also like the ability to override the resource’s choice of representation in a private or local scope. I think this is really just a facet of his first problem; find the cache at the appropriate scope (e.g., on the local machine, in the local network, etc.) and push a representation into it, with the appropriate metadata.

BTW, many caches allow you to modify their content through the local filesystem, if you know the proper incantations (and there are tools to help). How RESTful.

Finally, Norm would like a way to map arbitrarily specified URIs to other URIs, based on a ruleset. There’s actually been a lot of work in this area; a few IETF efforts around this space include the Contextualisation of Localisation BoF (what a mouthful), OPES and CDI; my own work in this area led me to produce URISpace. This, by the way, isn’t really a function of caching, but rather one of an intermediary (where caching often happens).

I’m not convinced that there’s need for a specialized peice of software called and “XML Catalog,” because all of the functions that Norm describes are generic and desireables for all kinds of content, not just XML.

I would not debate that the functionality of current implementations is seriously impaired, however. The caching community has focused entirely on performance, to the detriment of the functionality and granularity of capability in their products. I think there are some really interesting opportunities in this space, and would very much like to hear people’s thoughts.

Alas, I’m no longer paid to think about all things caching and intermediary, so my time for doing so is limited. That said, it looks like what I am paid for - Web services - is veering in the direction where intermediaries are in the spotlight, so I suspect interesting times may be ahead.