mnot’s Weblog

Design depends largely on constraints.” — Charles Eames

Wednesday, 10 March 2010

Caching-Tutorial für Webautoren und Webmaster

Thomas Hühn has graciously translated the caching tutorial into German. Thanks!

See also the Chinese, Czech and French translations. To help the translators keep up with changes, I've started hosting the raw document on Github, which can also be used to log issues.

this entry’s page

Thursday, 18 February 2010

Are Resource Packages a Good Idea?

Resource Packages is an interesting proposal from Mozilla folks for binding together bunches of related data (e.g., CSS files, JavaScript and images) and sending it in one HTTP response, rather than many, as browsers typically do.

Intuitively, this seems to make sense; less HTTP requests is good, right?

Maybe, maybe not. AFAICT, there aren’t any metrics comparing RP vs. traditional sites (has anyone done this?). In any case, a few concerns come to mind about this approach to making the Web faster.

Packaging and the Web

RP doesn’t have any generic metadata mechanism. The files in a RP are just that — bags of bits, whereas on the Web, we work with representations that include metadata.

So, clients will have to sniff the media type on each individual package member — something we’re trying to get away from on the Web. And, forget about using other types of header-based metadata as well.

For example, the draft points out that you can “even use ETags to invalidate the zip file when needed”. However, if a cache already has existing entries, the only things linking them are the URLs; since zip files don’t care much more metadata than the modification time, a cache doesn’t know their ETags.

Much better would be a generic, Web-centric packaging format, like MIME Multipart or Atom. It’s true that ZIP tools are more prevalent, but I’d be surprised if that would be a barrier once browsers deployed another format; when that happens, developers tend to fill the gaps quickly.

Getting Granularity and Ordering Right

Another concern I have is that Web sites are complex, and it’s difficult to choose exactly what to package up and what to leave separate.

The effects of packaging up too much could be profound; for example, a site that doesn’t use every bit of JavaScript and CSS on every page, but puts them all in a package, will cause a client to download more than it needs to start working on a given page.

While that isn’t a big deal if you’re sitting on a fat connection with a fast computer in your office, it matters when you’re across the world, or just browsing across a mobile network.

It'll also create a lot of duplication in proxy and accelerator caches, since clients that don't use RP will request the same things separately.

Likewise, if you don’t order the items in your package as the browser needs them, it will have a negative impact on performance, because the rendering engine will end up sitting around waiting for a required asset to come down the pipe. In effect, it’s enforcing head-of-line blocking on every response contained in the package.

I strongly suspect that choosing the right package granularity and ordering is going to be a very difficult and performance-sensitive task, and for many sites the interdependencies between JS, CSS and images will burn a lot of developer time tweaking packages.

The worst case is that some RP-enabled sites will resemble a Flash site from a UX perspective; one big serial download with a “waiting” graphic, followed by snappy performance. I don’t know about you, but I hate that.

Working with TCP

Finally, RP seems to be built on the argument that using fewer TCP connections is better. While it’s true that the browsers currently limit a page to six or eight connections, and any connections over that queue up, this is a) changing (see this issue), and b) not necessarily a bad thing.

It’s not (necessarily) bad because of TCP slow-start. As pointed out by Google and many others, a brand-new TCP connection’s throughput is fairly restricted until it has a number of round trips, and congestion (e.g., buffers in intervening routers filling it up) can slow it right back down again.

In other words, downloading 20 10k assets across eight parallel connections is often faster and more reliable than downloading one 200k asset over one connection. Browsers — intentionally or not — exploit this by using multiple parallel connections.

As such, putting all of your data eggs in one basket (as it were) can actually slow you down, never mind the ordering and granularity issues discussed above.

So, is RP a good idea?

Well, it’s certainly an interesting one, and in some cases — e.g., when you have a lot of very small assets that you know you’er going to use — it makes a lot of sense.

However, as it sits I don’t see any numbers quantifying a benefit (again, please correct me if I’m wrong!), and the existing recommendations (“serve all the resources... required by a page in a single HTTP request”) are a bit worrisome.

Putting that aside, this doesn’t feel like a long-term solution; it’s more of a band-aid over one set of specific problems in the 2010 Web.

I’m pretty biased towards a long-term solution here, because the cost of deploying clients is so high. While it’s true that more aggressive solutions like SPDY require both client and server support, server support isn’t hard to get once it’s in clients, and RP requires client support anyway.

So, I'd put forth that if we’re going to go to the effort to change clients, we should get the most bang for our buck, and make sure it lasts. Just my .02.

this entry’s page ( 19 comments )

Friday, 15 January 2010

WS-REST (heh, heh)

If you haven’t seen it already, check out the Call for Papers for the First International Workshop on RESTful Design (WS-REST 2010), where I’m on the program committee, along with many of the usual suspects.

Submissions due February 8, 2010, 23.59 Hawaii time. If only I were there to receive them…

this entry’s page ( 1 comment )

Wednesday, 16 December 2009

HTTP + Politics = ?

Australia has apparently decided, through its elected leaders, to filter its own Internet connection.

Since many, many other people are discussing whether this is advisable or indeed effective, I’ll focus here on what this will do to HTTP, and by extension the Web.

What’s on the Table

Reading the white paper, there are three different technologies for filtering the Web on the table;

Most of the ISPs that participated in the pilot chose the “pass-by hybrid” solution, for the very good reason that it doesn’t require an ISP to shove all of their traffic through a single box and hope it can keep up, thereby supporting claims that filtering won’t hurt Web performance.

However, if a site’s IP address is on the list, it does get sent to another box. Presumably, this is a box that acts as a pass-through filter or a proxy, so it inherits their problems for those sites. Given that some of those sites are likely to be YouTube, Flickr and so on, this isn’t just a corner case.

Pass-through filters need to be able to parse the entire request stream to pull out request-URIs and make a filtering decision. When they’re not blocking a URL and not overloaded, presumably they’ll perform adequately.

The interesting part comes when they do decide to block a URL. A simple implementation will presumably just block the HTTP response and splice in a canned, generic “blocked” one. However, that will break — sometimes spectacularly — a client that’s doing HTTP pipelining.

For example, if Alice and Bob are behind a corporate proxy which is pipelining away through a pass-through filter, and Alice makes a request to get blocked content, it can affect Bob’s request. Worse, if Bob requests a blocked URL after Alice does, a naive implementation could block Alice’s request.

The only way to properly block requests like this is to keep state about the request and the response around, so as to assure that you’re inserting the “blocked” response in the right place. In other words, you might as well be a proxy.

I will grant that pipelining isn’t widely used on the open Internet (although Opera does use it, and FireFox can be convinced to), but I can’t help but see the irony, given that it is one of the primary techniques for speeding up an HTTP connection — especially over long distances, which I hear we have in abundance down here.

Proxies, for better or worse, are a much more well-understood beast. Generally, you’re at the mercy of a proxy; if it decides to forbid certain HTTP methods (as is common), you can’t use them. If it doesn’t support Upgrade, Expect/Continue or chunked encoding, you won’t be able to use these HTTP features.

What this Means for the Web, and Australia

People don’t just use HTTP for browsing Web pages any more; it’s used for everything from desktop weather widgets to major system software updates to online gaming to document editing. People are also using HTTP in weird and wonderful ways to get things like Comet, BOSH and WebSockets happening.

By forcing ISPs to deploy middleboxes — without regard to protocol conformance or impact on these uses — we’re effectively profiling what people can do on the Web in Australia. This hurts the Web’s ability to grow and evolve, and it hurts Australia, by putting us at a competitive disadvantage to the rest of the world.

Furthermore, if “additional content” is filtered by ISPs, that means that — by the government’s own calculations — somewhere around 3% of HTTP requests will either get a non-standard error page, or mysteriously drop connections.

Think about that for a second; depending on how it’s calculated, you could easily be looking at several blank Web pages throughout your day, and sometimes your iPhone apps, your desktop widgets, your software updates just won’t work for some reason.

Companies like Google, Yahoo!, Amazon and Akamai spend lots of time and money making the Web go faster. While the white paper claims that filtering doesn’t slow the Web down in their tests, this ignores the opportunity cost that it introduces. Optimising YouTube, Flickr, GMail or any other performance-sensitive site is going to be much more difficult through a morass of content filters.

It’s true that Web sites already have to do with a multitude of proxies and other middleboxes on the open Internet anyway, but the difference is that if users don’t like what an ISP does to their packets, they can walk with their feet. There is no such option when the middlebox is mandated.

Making it BetterLess Bad

If the Government persists in mandating these filters (again, I’m just looking at the technical side here!), there are a few things that they can do to help, including:

One final thought. What will the Government’s reaction be once sites start deploying protocols like SPDY, which are going to be much less amenable to filtering, but much more powerful? Will we block them completely, thereby shutting ourselves off from the rest of the world?

this entry’s page ( 3 comments )

Friday, 13 November 2009

Will HTTP/2.0 Happen After All?

A couple of nights ago, I had a casual chat with Google’s Mike Belshe, who gave me a preview of how their “Let’s make the Web faster” effort looks at HTTP itself.

SPDY (nee FLIP) is an alternate application protocol that’s in Chromium, but buried so deeply that you have to enable it with a command-line option (—use-flip). AFAICT there aren’t even any public servers that support it yet, but it’s still a very exciting development.

Why? In a nutshell, it’s a binary, frame-based protocol for multiplexing bidirectional data streams over TCP (to start with). See flip_protocol.h for an idea of what it looks like, as well as the whitepaper.

HTTP’s Limits

HTTP-over-TCP has some pretty basic limits; most seriously, you can practically only have one request or response in flight on a connection at the same time.

Pipelining was designed to alleviate this, but at best it’s only a partial fix (head-of-line blocking is still an issue), and implementation problems means it’s almost unusable on the open Web (although Serf has had success in using pipelining in Subversion). It also can’t be used for methods like POST, which is important for interactive applications.

This drives people to use multiple, parallel TCP connections — something that we’ve accommodated in HTTPbis by lifting the two-connection limit for clients. However, that’s not a great solution either; TCP doesn’t allow you to share connection state between them, which brings problems when dealing with congestion.

What about WAKA?

These problems are well-known and have been discussed for years, all the way back to HTTP-NG, WebMUX and other efforts. More recently, Roy Fielding has been working behind the scenes on WAKA, with similar goals. So similar that I had to smile when Belshe explained what they were doing; it’s very similar to how Roy explains WAKA’s use of the transport.

However, I wouldn’t say that SPDY is competing with WAKA — yet. Belshe goes out of his way to point out that SPDY is more about doing real-world experimentation rather than saying “this is the protocol we’ll use.” In his words;

We're hoping to put theories to the test; while many of the ideas are not new, we're aggregating them, making them cooperate together, implementing them, and then measuring them. We hope that others will appreciate and expand this effort so that we can all evolve toward a protocol we think is universally better in a relatively quick timeframe.

In other words, they seem to be positioning this as input to the eventual design of HTTP/2.0, WAKA or whatever, rather than a browser-specific push to define a new protocol alone.

… and the IETF

The other interesting aspect, of course, is the relationship to WebSockets, especially since there was a pretty strong sense in the IETF earlier this week in Hiroshima that a Working Group to standardise it should be started. if SPDY really does eventually follow the path of WAKA, it could be that some HTTP-like use cases that people have planned for WebSockets may have another outlet instead.

Finally, you might ask what bearing this has on our efforts in HTTPbis. Right now, the answer is “nothing”, in that we’re chartered explicitly NOT to create a new version of HTTP. However, I think that our work — especially in splitting up the spec (a decision driven by Roy a long time ago) — will help any eventual successor protocol, whether it be WAKA, SPDY, their child or something completely different.

That’s because the minimum bar to entry for replacing HTTP/1.1 is to exactly support its semantics and capabilities, while making it more efficient. The fact that all of the wire-level goop in HTTP is now moving to a single, separate document helps that.

The last thing that I’d mention is that when we started HTTPbis a couple of years ago, there was a strong sentiment against creating a new protocol, both because of the can of worms it would open, and because of deployment problems in doing so. However, I’ve recently heard many people complaining about the limitations of HTTP over TCP, and it seems that one way or another, we’re going to start tackling that problem soon.

this entry’s page ( 6 comments )

Powered by Movable Type