mark nottingham

What is the Web?

Thursday, 4 December 2014

Internet and Web

This post is mostly for folks who haven’t been following Web standards closely — especially IETF folks. If you have been, there’s probably not much new here (but feel free to poke holes!).

It seems like a simple thing — after all, we all use the Web, and many have done for the better part of a quarter century. However, defining the Web — and especially, what it means to be “on the Web” — precisely turns out to be contentious.

For the longest time, the most accepted definition of the Web has been anything that has a URL. This is the “web as information system” view. It’s long been observed that of the three pillars of the Web — identifiers, formats and protocols — URLs are the most universal and stable; despite some arguments about their name, formats like HTML and protocols like HTTP can evolve and even be swapped out, without affecting whether any particular thing is “on the Web.”

While easy to understand and apply, this view has limits. Strictly applied, all of the WS-* muck was “on the Web”, and yet it had so little to do with the Web as to be laughable. Most “RESTful Web APIs” fall into the same bucket; while they all have URLs, and they get benefits from using HTTP (like caching), you can’t usually use them in a browser in any meaningful sense; at best you’d get a blob of XML or JSON back, without using links well. While there are exceptions (such as the folks industriously trying to do hypermedia APIs), they’re the exception.

This has led to a more restricted view of what’s “on the Web” — roughly, anything that you can do with a Web browser. By nature, this includes the “anything that has a URL” definition, but it excludes things that use URLs and HTTP and even HTML if they don’t involve a browser.

The “what you can do in a Web browser” view has evolved into what’s now called the Web Platform — it even has a Web site and a twitter — and it’s a central plank of what’s going on in the W3C. There, the focus is very much on turning the Web into a platform that can compete with those that threaten the Web — namely, iOS and Android*.

And that platform is being built; we now have Browser APIs for everything from vibration sensors to video capture. Tellingly, “Web API” means “an API in a Web browser” in these circles; there’s no concept of the interface exposed by a resource on the Web having that title.

This doesn’t sit well with some.

While I agree with Roy that the Web is fundamentally an information space, that there are non-browser users who are stakeholders and that this will always be a useful view, the platform view has its charms as well. Especially when we start thinking about interoperability and how we define what the Web is, in standards.

That’s because if you assume that browsers define the Web as a “platform” lots of questions that we thought were settled get thrown up in the air again.

Coordinating Extension

For example, it’s long been held that when you define an extension point in a standard, you generally need some way to coordinate it. The IETF does this with registries; the W3C had a fashion for using URIs as namespaces for a time (and then vendor prefixes — but that’s another rant). If browsers themselves become that lynchpin, you don’t need registries or namespaces; you just edit the spec — provided that the spec is faithfully reflecting what the browsers implement. The argument goes that in a browser-ruled Web, other software using the specification doesn’t want to diverge from the behaviour of a Web browser, because doing so would cause interoperability problems and thereby reduce that software’s value. So, just make sure the browsers are walking in lockstep and document what they do in the specs; you don’t need no stinking registry.

This has already played out more than once; for example, in WebCrypto’s approach to documenting what algorithms are available. Since this is a Browser API, and the browsers implementing it have a strong interest in interop’ing tightly, this makes sense. I’m very much reminded of the arguments I used against namespaces in SOAP (also informed by Roy, BTW) way back when; tight coordination in the spec arguably assures that extensibility won’t be misused to flex market power muscles.

And, considering the downsides of namespaces and the considerable pain involved in running registries, this seems like a good outcome — as long as the community using a specification wants to track browsers closely, and there’s a reasonable chance that the browsers will work together.

That view carried the day for WebCrypto, and is likely to for other browser-oriented APIs. Elsewhere, it’s not as clear-cut; in HTTP-land, for example, we acknowledge that other user-agents like Web crawlers and testing tools want to stay close to browsers, while other uses of the protocol don’t involve a browser. Link relations are too broadly-based to make editing the HTML5 specification practical, so it uses a wiki, and we’re talking about doing the same for the IETF spec.

Specification Versioning

Another example is how we version specifications — if at all. Since many browsers have gone “ evergreen” — i.e., to short release cycles, combined with automatic self-update — releasing specific versions of HTML, CSS, DOM and so on doesn’t make as much sense; what’s relevant is what’s currently implemented. This leads to specs becoming “Living Standards,” as espoused by the WHATWG — i.e., constantly updated documents, based upon not only natural evolution of what they document (whether it be an API, format or concept), but also incorporating bug fixes, improved examples, and better alignment with the reality of what the Web actually is. Given that the browsers themselves have to deal with the amorphous mass of content that is the Web — and its authors are notoriously poor at actually declaring versions or sticking to them — this makes a certain kind of sense.

Much has been said about this topic (and undoubtedly more will), but again we have parties that aren’t necessarily wanting to align with the browsers asserting that their needs are important too. For example, people who want to define conformance criteria for a government will find this maddening. It also makes referring to documentation problematic — especially if the authors aren’t extremely careful about how they make changes. Still, the central point of a Living Standard is that versioned specifications are a fiction, and on something as wild and wooly as the Web — where again, we tend to have browsers flocking together closely, producing a huge network effect — this makes a certain amount of sense.

I suspect that the answer here will again be case-by-case; careful versioning is really important for some things (e.g., HTTP, although we’ve got some bits wrong) and not for others (such as HTML and associated APIs). It also looks like the W3C is moving towards a model where it publishes “snapshots” of the Living Standards for those audiences that really can’t stomach an unstable reference. “But, but, but…” I can hear you saying; “how will they handle breaking changes?” The answer in the case of HTML is that they either a) won’t, or b) they’ll coordinate it and eat the intro problems (presumably because they already had an interop issue, and it was judged the lesser evil). For other things like API changes, releasing things under new names (even if it does end in a digit) allows things to be rolled out incrementally — keeping in mind that the spec for that new thing is still likely to be “living.”

Consensus

Viewing the Web as being browser-centric is either a power grab of monstrous proportions or merely recognising the reality of the Web since its very beginning. Either way, the nature of consensus is changing. This is especially true in the W3C, which has been under enormous pressure from the WHATWG over the last 5+ years, with no end in sight; Jeff Jaffe, the CEO of the W3C, gives a polished overview here.

I’ll abstain from making too many observations here, except to say that in my experience, committees with full consensus models seldom create successful specs.

What is a Browser?

Finally, defining the Web in terms of browsers as a “platform” is starting to turn the notion of a browser on its head; we now have phones, TVs, cars and much more becoming part of the Web. This pressure to include more kinds of devices — along with the emerging non-traditional browsers out of places like China and India — are, I suspect, going to put the “follow the browsers” model under a certain amount of stress; i’ll be interesting to see how well it serves. * An observant and slightly cynical reader, at this point, might notice how some of the very same parties that are expressing such concern about this are also putting serious coin into these proprietary platforms.