mark nottingham

Google's Cache-Control Extensions

Thursday, 12 May 2005

HTTP Caching

I happened to look at the HTTP headers returned from Google News just now (what can I say, I’m a HTTP geek), and I noticed something unusual;

Last login: Thu May 12 16:52:59 on console
Welcome to Darwin!
mnot-laptop:~> telnet news.google.com 80
Trying 64.233.161.147...
Connected to news.l.google.com.
Escape character is '^]'.
HEAD / HTTP/1.1
Host: news.google.com

HTTP/1.1 200 OK
Set-Cookie: PREF=ID=d33570687b199641:TM=1115954884:LM=1115954884:S=EKAAPop2tSe_wM0T;
  expires=Sun, 17-Jan-2038 19:14:07 GMT; path=/; domain=.google.com
Content-Type: text/html; charset=ISO-8859-1
Server: NFE/0.8
Cache-Control: private, x-gzip-ok=""
Content-Length: 0
Date: Fri, 13 May 2005 03:28:04 GMT

What’s up with that cache-extension directive? It’s perfectly legal; there isn’t a cache extension registry (yet… hmm ;) and it doesn’t appear to modify the semantics of the other directives (as specified; BTW, this model is something that XML folks should look at, it’s a nice example of good extensibility), because it’s private too, which is fairly restrictive on non-extension-aware caches.

So, what does it do? Any Googlers out there care to speak up? My theory is that they’re using a caches in an outer layer of surrogates (a.k.a. gateways) to serve content, and this is a bit of their control metadata (for when to do gzip) leaking out. If so, that wouldn’t be uncommon; NetApp does it, and we did the same thing at Akamai with ESI. The only difference is that in both of those instances, a separate header was used, so it wouldn’t leak out so easily.

The other possibility is that they’re doing something on the client side, which raises all sorts of interesting possibilities. One of the things I’ve been wondering about recently is about the interesting uses of getResponseHeader() in XMLHTTPRequest; e.g., you could build an entire client-side cache in JavaScript, bypassing the crappy cache that’s in most user agents. Or you could build your own protocol extensions by manipulating both the client and server headers. Or you could implement a POE client entirely in JavaScript (hint, hint ;).

Anybody? My money’s on the surrogate scenario, because ‘x-gzip-ok’ seems like an optimisation hint to downstream caches.


5 Comments

Ian Bicking said:

Incidentally, a client-side Javascript cache:

http://adamv.com/dev/javascript/ajax

Friday, May 13 2005 at 7:21 AM

Jim said:

What would it do that can’t be determined from existence/lack of Accept-Encoding and Cache-Control: no-transform?

Saturday, May 14 2005 at 9:49 AM

John Rewrite said:

They talk about it at http://www.askapache.com/2006/htaccess/speed-up-sites-with-htaccess-caching.

Where basically they say google is using a cache-control extension header called “x-gzip-ok” for its border proxies..

Any cache that doesn’t understand that cache-control extension defaults to the “cache-control:private” setting.

Tuesday, January 23 2007 at 4:50 AM