mark nottingham

Why Revise HTTP?

Sunday, 9 December 2007

HTTP

I haven’t talked about it here much, but I’ve spent a fair amount of time over the last year and a half working with people in the IETF to get RFC2616 — the HTTP specification — revised.

That effort reached a milestone last week when the HTTPbis Working Group had its first face-to-face meeting in Vancouver. It’s still early days, but we’ve already made good progress; based on what we saw in the room, for example, it looks like Roy’s partitioned drafts will become the basis of the new work, and Roy, Julian and Yves will work together as editors on the new documents.

When I talk to people about this, however, I often get asked why we’re doing it. Revising HTTP certainly isn’t as sexy as coming up with another format or protocol for the world to adopt, and many people see what’s going on as boring, or not mattering to developers.

I couldn’t disagree more; this work must take place, and now is the best time.

HTTP started as a protocol just for browsers, and its task was fairly simple. Yes, persistent connections and ranged requests make things a bit more complex, but the use cases were relatively homogenous almost a decade ago, and the people doing the implementations were able to assure interop for those common cases.

Now, a new generation of developers are using HTTP for things that weren’t even thought of then; AJAX, Atom, CalDAV, “RESTful Web Services” and the like push the limits of what HTTP is and can do. The dark corners that weren’t looked at very closely in the rush to get RFC2616 out are now coming to light, and cleaning them up now will help these new uses, rather than encourage them to diverge in how they use HTTP.

So, while the focus of the WG is on implementors, to me that doesn’t must mean Apache, IIS, Mozilla, Squid and the like; it also means people using HTTP to build new protocols, like OAuth and Atom Publishing Protocol. It means people running large Web sites that use HTTP in not-so-typical ways.

Another reason to revise HTTP is that there’s a lot of things that the spec doesn’t say. The people who were there in the late 90’s understand the context, and those who have been around HTTP enough have learned to understand the thinking behind its design and the intent of its features. However, there’s a whole new generation of implementers and extension builders who haven’t been exposed to this. If we can document the philosophy of HTTP with regard to extensibility, error handling, etc., they have a better chance of understanding the right way to use it.

Last, but certainly not least, getting a bunch of HTTP implementers together and actively discussing the spec also leads to the possibility of interop work. While the IETF doesn’t do formal test suites, we can still come up with informal tools and a test corpus for improving interop.

In the end, my personal goal for this effort is fairly selfish; my job involves helping people inside my company understand how to use and extend HTTP in the best way possible. The current spec makes it very hard to do that, but a revision gives us the chance of improving the spec, people’s understanding of it, and how well it’s implemented.


9 Comments

Gabe Wachob said:

Mark-

I had a lot of questions about why this work was going forward. Its great to hear that the focus is on explanation of the original intent of HTTP, and not to change the protocol itself. I wholeheartedly agree that most folks have no idea how flexible and usable HTTP is, even with today’s traditional implementations, and without extensions.

There are indeed a number of dark corners being addressed - is a list of those “dark corners” being put together somewhere that I could cite?

-Gabe

Sunday, December 9 2007 at 7:45 AM

Breton said:

Just my two little grey areas about the http spec: Clarify the role of the semicolon. It’s in the spec as a kind of odd man out- having no special meaning, and yet still reserved in a certain weird unspecified way. Tim Berners Lee had some ideas about Matrix URI’s, but that went nowhere. A suggestion I made to Ruby On Rails creator DHH involving semicolons ended up breaking rails in safari.

The other grey area: An application may have two views of the same resource with the same mime type, but http has no obvious way of handling this situation, even though it does have mechanisms for different mime types at the same URI, for the same resource, via content negotiation. For instance, you can’t have an edit view and a read only view, both text/html at the same URI. But you can have, for instance, a gif and a png at the same url.

Typically this is handled by putting the different views at different URIs. This may be sufficient, but isn’t codified in the spec, and doesn’t really quite fit the tone of the rest of the spec.

These are dumb nitpicks but I thought I’d throw it in the hat since that’s what I’ve run into personally.

Sunday, December 9 2007 at 7:51 AM

Rafael de F. Ferreira said:

I wonder how this (http://www.snellspace.com/wp/?p=803) fits into the picture? Are new http authentication schemes being considered?

Sunday, December 9 2007 at 12:57 PM

Vincent Murphy said:

Breton: I would regard the ‘edit’ and ‘read-only’ views as both being distinct resources derived from a common root. Therefore I feel OK about minting 2 extra URIs for them.

Mark: I see this work you are leading as a noble endeavour. There are a ton of undocumented assumptions in HTTP which need to be surfaced given the new broad audience for REST.

Monday, December 10 2007 at 3:18 AM

Dave Pawson said:

Thanks Mark and the others on this. My only hope is that the resultant text is clear, both in terms of specificity and clarity. Seems you have the bright sparks working on the specificity. If you want it checking for (mis)understanding I’d be glad to help.

Best Wishes.

Tuesday, December 11 2007 at 2:36 AM

Steve Clay said:

I’d love to see a more thorough section devoted to Cache-Control and caching in general.

Tuesday, December 11 2007 at 6:41 AM

Dan Kubb said:

One of those not-so-typical ways people are being HTTP is in Comet apps. Various techniques are used to keep connections open to clients, but its unclear whether or not they are spec compliant.

In particular, there’s something called “long-polling”, where a client performs a GET request but the response is not immediately sent. Instead the connection is held open until state changes on the server, and at that point the server responds. After receiving a response (or a timeout) the client may reconnect and “long-poll” again.

Friday, December 14 2007 at 6:41 AM

Jon said:

I certainly agree that older people understand the intent and model behing HTTP in a way that newcomers to it don’t. However, HTTP has now become quite a fundamental and low-level protocol (amazing to those of us who remember life before it!). I think small tweaks to such protocols are rarely beneficial. HTTP is well designed, and is clearly highly extensible. Areas like cache-control require better documentation, rather than changes.

The current difficulties with REST and URIs points more toward the need for a new protocol, rather than changes to HTTP. You can do what SOAP does and use HTTP (rather pointlessly IMO) as a transport layer beneath a whole other protocol, or you create a new protocol from scratch (presumably using TCP/IP as a transport layer), which is a perfectly reasonable thing to do. It’s not ‘forking’ HTTP, anymore than LDAP is.

The thing that needs extending is not HTTP, but our concept of the ‘Web’. Most browsers support protocols beyond HTTP (FTP being the obvious one). Why shouldn’t they support a few more?

In the ‘old’ days people used many protocols to access the internet - FTP, HTTP, SMTP, NNTP, Gopher (briefly), Telnet. While HTTP improves on many of these, it’s silly to through out the baby with the bathwater. We need to learn to live in a hetergenous environment again - the right protocol for the right job, not one protocol for everything.

I look forward to LAMP - lightwight API message protocol ;-)

Friday, December 14 2007 at 9:50 AM