Are Resource Packages a Good Idea?

Thursday, 18 February 2010

Resource Packages is an interesting proposal from Mozilla folks for binding together bunches of related data (e.g., CSS files, JavaScript and images) and sending it in one HTTP response, rather than many, as browsers typically do.

Intuitively, this seems to make sense; less HTTP requests is good, right?

Maybe, maybe not. AFAICT, there aren’t any metrics comparing RP vs. traditional sites (has anyone done this?). In any case, a few concerns come to mind about this approach to making the Web faster.

Packaging and the Web

RP doesn’t have any generic metadata mechanism. The files in a RP are just that — bags of bits, whereas on the Web, we work with representations that include metadata.

So, clients will have to sniff the media type on each individual package member — something we’re trying to get away from on the Web. And, forget about using other types of header-based metadata as well.

For example, the draft points out that you can “even use ETags to invalidate the zip file when needed”. However, if a cache already has existing entries, the only things linking them are the URLs; since zip files don’t care much more metadata than the modification time, a cache doesn’t know their ETags.

Much better would be a generic, Web-centric packaging format, like MIME Multipart or Atom. It’s true that ZIP tools are more prevalent, but I’d be surprised if that would be a barrier once browsers deployed another format; when that happens, developers tend to fill the gaps quickly.

Getting Granularity and Ordering Right

Another concern I have is that Web sites are complex, and it’s difficult to choose exactly what to package up and what to leave separate.

The effects of packaging up too much could be profound; for example, a site that doesn’t use every bit of JavaScript and CSS on every page, but puts them all in a package, will cause a client to download more than it needs to start working on a given page.

While that isn’t a big deal if you’re sitting on a fat connection with a fast computer in your office, it matters when you’re across the world, or just browsing across a mobile network.

It’ll also create a lot of duplication in proxy and accelerator caches, since clients that don’t use RP will request the same things separately.

Likewise, if you don’t order the items in your package as the browser needs them, it will have a negative impact on performance, because the rendering engine will end up sitting around waiting for a required asset to come down the pipe. In effect, it’s enforcing head-of-line blocking on every response contained in the package.

I strongly suspect that choosing the right package granularity and ordering is going to be a very difficult and performance-sensitive task, and for many sites the interdependencies between JS, CSS and images will burn a lot of developer time tweaking packages.

The worst case is that some RP-enabled sites will resemble a Flash site from a UX perspective; one big serial download with a “waiting” graphic, followed by snappy performance. I don’t know about you, but I hate that.

Working with TCP

Finally, RP seems to be built on the argument that using fewer TCP connections is better. While it’s true that the browsers currently limit a page to six or eight connections, and any connections over that queue up, this is a) changing (see this issue), and b) not necessarily a bad thing.

It’s not (necessarily) bad because of TCP slow-start. As pointed out by Google and many others, a brand-new TCP connection’s throughput is fairly restricted until it has a number of round trips, and congestion (e.g., buffers in intervening routers filling it up) can slow it right back down again.

In other words, downloading 20 10k assets across eight parallel connections is often faster and more reliable than downloading one 200k asset over one connection. Browsers — intentionally or not — exploit this by using multiple parallel connections.

As such, putting all of your data eggs in one basket (as it were) can actually slow you down, never mind the ordering and granularity issues discussed above.

So, is RP a good idea?

Well, it’s certainly an interesting one, and in some cases — e.g., when you have a lot of very small assets that you know you’er going to use — it makes a lot of sense.

However, as it sits I don’t see any numbers quantifying a benefit (again, please correct me if I’m wrong!), and the existing recommendations (“serve all the resources… required by a page in a single HTTP request”) are a bit worrisome.

Putting that aside, this doesn’t feel like a long-term solution; it’s more of a band-aid over one set of specific problems in the 2010 Web.

I’m pretty biased towards a long-term solution here, because the cost of deploying clients is so high. While it’s true that more aggressive solutions like SPDY require both client and server support, server support isn’t hard to get once it’s in clients, and RP requires client support anyway.

So, I’d put forth that if we’re going to go to the effort to change clients, we should get the most bang for our buck, and make sure it lasts. Just my .02.

Mark Nottingham

other HTTP posts