Ideal HTTP Performance

Friday, 22 April 2016

The implicit goal for Web performance is to reduce end-user perceived latency; to get the page in front of the user and interactive as soon as possible.

As far as HTTP is concerned, that implies that an ideal protocol interaction looks something like this:

Client->Server: GET /page
Server->Client: just enough data...
Server->Client: ...to show the page

That is, a page load should send the minimal amount of data to the server and download the minimal amount of data possible needed from it in the least possible number of round trips.

Extra data in either direction means more time to transfer it and more chances to have a problem like congestion or packet loss throw a spanner in the works and seriously affect performance.

Extra round trips due to protocol “chattiness” add latency, especially on mobile networks (where 100ms round trips are about the best you can expect; see Ilya’s post for more).

If this is the ideal, how does HTTP measure up? And, how are we looking to improve it?

HTTP/1.1

HTTP/1.1 is a good protocol for a variety of reasons, but unfortunately the way the modern Web works means that performance isn’t one of them. A typical page load looks something like this:

Client->Server: GET /page (with cookies, referer, user-agent...)
Server->Client: Some HTML
Client->Server: GET /style.css /script.js (with cookies, referer, user-agent...)
Server->Client: A big blob of CSS and a few big JS frameworks
Client->Server: GET /image.jpg (with cookies, referer, user-agent...)
Server->Client: Some images
Client->Server: GET more images (with cookies, referer, user-agent...)
Server->Client: Some more images...

Not exactly ideal.

The Web’s use of HTTP/1 is “chatty”, because the client needs to go back to the server multiple times to discover new things that it finds; first in the HTML, and then later in the CSS and JavaScript. Each one of these exchanges adds a new round trip (or more) of latency to the page load, violating our “least number of round trips” ideal.

Furthermore, just making the requests for the page adds up to a lot of data, violating our “minimal amount of data to the server” ideal. This is because verbose headers like Referer, User-Agent and of course Cookie are repeated on every request, and multiplied by the (often) hundreds of assets needed by the average Web page.

Finally, Because of HTTP/1’s head-of-line blocking, it’s become common practice to combine multiple assets into one a la CSS spriting, inlining and concatenation. These are nifty HTTP/1 performance hacks, but they have a cost; they download more data than the client needs to show the page, which violates our ideal and means that we’re not showing the page as absolutely fast as we could.

All of that said, HTTP/1.1 isn’t all bad, performance-wise; for example, it has caching, which allows you to avoid using the network at all when you have a fresh copy, and conditional requests, which allow you to avoid transferring big things if you have a stale copy in cache.

HTTP/2

HTTP/2 tries to address the issues in 1.1 in a few ways:

Full multiplexing means that head-of-line blocking is no longer an issue; you can load a Web page on a single HTTP connection, and not worry about how many requests are made. Data-wasting “optimisation” techniques can be left behind.
Header compression removes per-message overhead caused by verbose headers; now you can fit tens (or even hundreds) of requests in just a couple of IP packets, getting closer to the “minimal data” ideal in both directions.
HTTP/2 Server Push allows a server to anticipate what the client is going to need, avoiding the round trips of chattiness.

So, a HTTP/2 interaction looks more like this:

Client->Server: GET /page
Server->Client: Some HTML
Server->Client: Push page-specific CSS, JavaScript
Server->Client: Push page-specific images

Here, you can see that the server is sending CSS, JavaScript and images to the client without being asked for them; it knows that the client is probably going to ask for them, so it uses Server Push to send a synthetic request/response pair to the client, saving a round trip. It’s a less “chatty” protocol, and it uses the network more fully as a result.

Mind you, that’s not to say that this is all easy; there are still lots of open questions about HTTP/2, especially around when to push something. I’ll talk about that more separately soon.

HTTP/2 + Cache Digests

A common question about Server Push is “what if the client already has a copy in cache?” Because Push is inherently speculative, there’s always the chance that you’re sending something that the browser doesn’t need.

HTTP/2 allows the client to cancel the push in this situation, with a RESET_STREAM. However, even then, there’s roughly a round trip’s worth of wasted data in flight that could have been used for better things. Remember, the ideal is to send only the data that the client needs to show the page.

A proposed solution for this is for the client to use a compact Cache Digest to tell the server what it already has in cache, so that the server knows what’s needed.

Client->Server: GET /page (and BTW here's what I have in cache... )
Server->Client: Some HTML
Server->Client: Push page-specific CSS, JS and images...
Server->Client: ...that the client doesn't already have.

Because Cache Digests use Golumb Compressed Sets, it’s realistic to think that a browser cache can tell the server what it has in less than a thousand bytes, sending it to the server in the first few packets of a connection.

Now, we’ve avoided the chattiness of extra round trips, the wasted data of concatenation, inlining and similar hacks, and the wasted data of pushing unneeded requests. This gets us ever-closer to our ideal!

Cache Digests are just a proposal, but there’s a fair amount of interest in them in the HTTP community. Hopefully we’ll start seeing them on the wire in the not-too-distant future.

TCP

So far, I haven’t talked about the performance impact of the other protocols that a browser uses to load a Web page.

However, there’s more than the diagrams above imply; TCP requires a three-way handshake before HTTP starts, to negotiate the parameters of a new connection:

Client->Server: SYN ("I'd like to talk. Here's my parameters.")
Server->Client: SYN+ACK ("I got those parameters. Here are my parameters.")
Client->Server: ACK ("I got those parameters. Let's talk.")
Client->Server: GET /page ("Finally, some HTTP!")

This means that a minimum of a full round trip is taken for connection setup, adding chattiness to every new connection.

TCP Fast Open allows applications to send data on the SYN and SYN+ACK packets to avoid this. Unfortunately, it’s currently only supported by Linux and OSX, and furthermore, there are some tricky consequences of using TFO with HTTP that the community is just starting to work through.

Namely, TFO doesn’t guarantee that the data sent with the SYN packets will appear only once; it’s vulnerable to duplication (thanks to retransmits) or even malicious replay attacks. So, HTTP POST isn’t a good idea for the first request on a TFO connection. More problematically, some GETs still have side effects too, but browsers don’t have a good way to detect which URLs do this.

TLS

TLS adds another kind of connection startup chattiness, after the TCP handshake is done. It looks like this:

Note over Client, Server: TCP Handshake
Client->Server: ClientHello
Server->Client: ServerHello, Certificate, ServerHelloDone
Client->Server: ClientKeyExchange, ChangeCipherSpec, Finished
Server->Client: ChangeCipherSpec, Finished
Client->Server: GET /page

That’s two full round trips before HTTP can send any data; chatty indeed. When the client has been to the server before, session tickets allow you to avoid one of those round trips:

Note over Client, Server: TCP Handshake
Client->Server: ClientHello
Server->Client: ServerHello, ChangeCipherSpec, Finished
Client->Server: ChangeCipherSpec, Finished
Client->Server: GET /page

Soon, TLS 1.3 will allow a “zero round trip” handshake when the client has been to the server previously – in other words, HTTP will be able to send data on the first round trip, avoiding any added latency. However, like TCP Fast Open, you need to be able to make sure that repeated data in that first round trip won’t break anything.

HTTP/next

TCP Fast Open and TLS 1.3 are both ways to reduce the chattiness of opening a new connection to a server. Another way to do this is to reuse the connections you already have open as much as possible.

To that end, there’s discussion of how to use HTTP/2’s connection coalescing more aggressively; not only does it help avoid the overhead of opening new connections, but it also makes existing ones more efficient, as TCP works best with long-lived, busy flows.

This includes things like pushing certificates to the client, to prove that the connection can be used for more origins that it originally negotiated.

A much more radical change is also under discussion now; swapping out TCP for UDP, a la QUIC. There are lots of moving parts to QUIC, but from a performance perspective, the ability to have a zero-round trip handshake without changing the client’s Operating System is very attractive. Furthermore, being able to access data in buffers out of order means that the implicit head-of-line blocking in TCP (thanks to it being an in-order protocol) is no longer an issue; you can cherry-pick HTTP messages (or parts thereof) behind a dropped packet to get them in front of the user faster.

It’s still early days for QUIC, so we may not see a UDP-based HTTP standardised and widely deployed for a while – or even at all. One possible future discussed is that QUIC is a testbed that we use to learn more about what we want from TCP performance, and we can apply them to the Web “in flight.” How possible that is on a heterogeneous Internet remains to be seen.

Mark Nottingham

other HTTP posts