mnot’s Web log

Design depends largely on constraints.” — Charles Eames

Thursday, 12 May 2005

Google's Cache-Control Extensions

I happened to look at the HTTP headers returned from Google News just now (what can I say, I’m a HTTP geek), and I noticed something unusual;

Last login: Thu May 12 16:52:59 on console
Welcome to Darwin!
mnot-laptop:~> telnet news.google.com 80
Trying 64.233.161.147...
Connected to news.l.google.com.
Escape character is '^]'.
HEAD / HTTP/1.1
Host: news.google.com

HTTP/1.1 200 OK
Set-Cookie: PREF=ID=d33570687b199641:TM=1115954884:LM=1115954884:S=EKAAPop2tSe_wM0T; 
  expires=Sun, 17-Jan-2038 19:14:07 GMT; path=/; domain=.google.com
Content-Type: text/html; charset=ISO-8859-1
Server: NFE/0.8
Cache-Control: private, x-gzip-ok=""
Content-Length: 0
Date: Fri, 13 May 2005 03:28:04 GMT

What’s up with that cache-extension directive? It’s perfectly legal; there isn’t a cache extension registry (yet… hmm ;) and it doesn’t appear to modify the semantics of the other directives (as specified; BTW, this model is something that XML folks should look at, it’s a nice example of good extensibility), because it’s private too, which is fairly restrictive on non-extension-aware caches.

So, what does it do? Any Googlers out there care to speak up? My theory is that they’re using a caches in an outer layer of surrogates (a.k.a. gateways) to serve content, and this is a bit of their control metadata (for when to do gzip) leaking out. If so, that wouldn’t be uncommon; NetApp does it, and we did the same thing at Akamai with ESI. The only difference is that in both of those instances, a separate header was used, so it wouldn’t leak out so easily.

The other possibility is that they’re doing something on the client side, which raises all sorts of interesting possibilities. One of the things I’ve been wondering about recently is about the interesting uses of getResponseHeader() in XMLHTTPRequest; e.g., you could build an entire client-side cache in JavaScript, bypassing the crappy cache that’s in most user agents. Or you could build your own protocol extensions by manipulating both the client and server headers. Or you could implement a POE client entirely in JavaScript (hint, hint ;).

Anybody? My money’s on the surrogate scenario, because ‘x-gzip-ok’ seems like an optimisation hint to downstream caches.


Filed under: Caching, Web

discussion of this entry

Ian Bicking said…

Incidentally, a client-side Javascript cache:

http://adamv.com/dev/javascript/ajax

Friday, May 13 2005 at 7:21 AM +10:00

Jim said…

What would it do that can't be determined from existence/lack of Accept-Encoding and Cache-Control: no-transform?

Saturday, May 14 2005 at 9:49 PM +10:00

Mark Nottingham said…

Jim —

Perhaps they didn’t know about it, or they didn’t want to preclude other transformations, either on their box, or downstream. Just guessing.

Sunday, May 15 2005 at 9:20 AM +10:00

John Rewrite said…

They talk about it at http://www.askapache.com/2006/htaccess/speed-up-sites-with-htaccess-caching.


Where basically they say google is using a cache-control extension header called "x-gzip-ok" for its border proxies..


Any cache that doesn't understand that cache-control extension defaults to the "cache-control:private" setting.

Tuesday, January 23 2007 at 4:50 AM +10:00

Mark Nottingham said…

Who's "they"? I don't see any explanation of the extension, just someone else saying that they say it...

Tuesday, January 23 2007 at 10:01 AM +10:00

add to the discussion

your details

name
e-mail address

Your e-mail address will not be shared.

your comment

Separate paragraphs with blank lines; HTML markup will be removed.

By submitting a comment, you agree to grant a limited license to reproduce it, under the same terms as the page being commented upon. If you have questions or prefer other terms, please contact me.

Creative Commons License