Thursday, 12 May 2005
Google's Cache-Control Extensions
I happened to look at the HTTP headers returned from Google News just now (what can I say, I’m a HTTP geek), and I noticed something unusual;
Last login: Thu May 12 16:52:59 on console Welcome to Darwin! mnot-laptop:~> telnet news.google.com 80 Trying 18.104.22.168... Connected to news.l.google.com. Escape character is '^]'. HEAD / HTTP/1.1 Host: news.google.com HTTP/1.1 200 OK Set-Cookie: PREF=ID=d33570687b199641:TM=1115954884:LM=1115954884:S=EKAAPop2tSe_wM0T; expires=Sun, 17-Jan-2038 19:14:07 GMT; path=/; domain=.google.com Content-Type: text/html; charset=ISO-8859-1 Server: NFE/0.8 Cache-Control: private, x-gzip-ok="" Content-Length: 0 Date: Fri, 13 May 2005 03:28:04 GMT
What’s up with that cache-extension directive? It’s perfectly legal; there isn’t a cache extension registry (yet… hmm ;) and it doesn’t appear to modify the semantics of the other directives (as specified; BTW, this model is something that XML folks should look at, it’s a nice example of good extensibility), because it’s private too, which is fairly restrictive on non-extension-aware caches.
So, what does it do? Any Googlers out there care to speak up? My theory is that they’re using a caches in an outer layer of surrogates (a.k.a. gateways) to serve content, and this is a bit of their control metadata (for when to do gzip) leaking out. If so, that wouldn’t be uncommon; NetApp does it, and we did the same thing at Akamai with ESI. The only difference is that in both of those instances, a separate header was used, so it wouldn’t leak out so easily.
Anybody? My money’s on the surrogate scenario, because ‘x-gzip-ok’ seems like an optimisation hint to downstream caches.