Saturday, 23 August 2003
Registering Media Types
I’ve had a fairly large and annoying bee in my bonnet for the past few months, regarding media type registration. It started buzzing when I tried (and failed) to register a media type for RSS, and has continued to grow as I attempt to do the same for SOAP, on behalf of the XML Protocol Working Group.
Those who defend the registration process (as embodied in RFC2048) will say that it accommodates a number of scenarios, and is adequate to the task. I disagree. Media types worked for the MIME world, where a few companies and standards organisations created formats infrequently. It utterly fails in the Web, arguably media types’ biggest use today.
In fact, I assert that media type registration restrains the Web from reaching its full potential, and that the W3C – an organization with exactly that as its stated goal – has an obligation to do something about it.
Why? Formats are one of the three legs of the Web, along with identifiers and protocols. In the old days, formats were highly specialized and created from the ground up, often as packed binary data representations. The W3C changed all of this with XML; almost anyone can design their own data format using it, whether it’s the Widget Markup Language or the Foo Data Representation Format. Now, you can design your own format, put its syntax in a namespace that you control and publish a schema for it on your Web site without contacting or getting the approval of anyone; it just happens. Magic.
This conforms to an underlying theme in the Web architecture - a reluctance to centralize anything that isn’t necessary. URIs do a very clever thing by leveraging domain names; they allow the delegation of naming authority as well as gain a dereference mechanism by using something that already exists and is widely deployed.
Why is it, then, that when you come up with your brand new format for your business or pleasure, you have to register it with IANA to make it a first-class citizen on the Web? Unless you do, you have to use “application/xml” (or the more frowned-upon “text/xml”) and not be able to use content negotiation, not be able to do media type dispatch in browsers or in servers, and you’ll be forced to use some other identifier (like QNames in WSDL, yuck) to identify the format?
The consequences of this are far-reaching; if it’s difficult for the author of a format to register a media type, they won’t. They don’t care because format will still get used. However, it won’t be used well in the context of the Web; instead, people will invent their own negotiation and dispatch mechanisms. In short, the format doesn’t suffer (because this is the status quo); the Web suffers, and consequently, people who use the format on the Web suffer, because it isn’t a first-class format on the Web.
Why is it so hard?
If, like most people, you want “application/widgetML” for your media type, you have to go through the process of publishing an RFC, getting IESG approval, and then getting it into IANA. This is not a simple process; getting an RFC published can take anywhere from a couple of months to a length of time that requires me to make provisions in my will for completing it. Getting IESG approval has ill-defined criteria and an even more opaque process.
It’s true that one can use the “prs” and “vnd” trees, but, for whatever reason, these are seen as ghettos. Browse through the registry if you don’t believe me; except for a few erstwhile vendors and a few IETF oldtimers, noone uses them. More to the point, to actually understand that you have these options available, you have to wade through the IANA Web site, the truely monsterous RFC2048, and then wait an unbounded amount of time for IANA to assign you a subtree and get back to you. While this seems trivial to an IETF old hand, it’s quite intimidating to a newbie who just wants an identifier for their format. In fact, the W3C has even prepared a guide so that W3C Working Groups can pick their way through the minefield of media type registration, and the associated scheduling problems.
Even more to the point, what benefit is there to centralized registration? There’s no oversight or even documentation required for the “prs” and “vnd” trees. In fact, the only thing that IANA brings to the table is a repository so that people can see what media types are registered.
Please understand, I’m not saying that the IESG or IANA are bad, or are doing a bad thing (well, I am saying that the IESG is a bottleneck, but I’ll have to get in line behind everyone else on that one). They’re generally good, dedicated and smart people who are overloaded doing a difficult job (in addition to their day jobs, in many cases). I am saying that in this case, the effort put into registering media types could be better spent – on both sides.
How? Well, let’s see. We need a way to get globally unique identifiers, ideally without any (additional) centralized registry. Wouldn’t it be good to take a page from URIs and leverage domain names in some fashion?
I can think of a couple of ways to do this; the solution might be universal (e.g., a new media type tree) or maybe XML-specific (e.g., adding a parameter to “application/xml”). However, I’d like to hear what other people say first.