Wednesday, 24 August 2011
Distributed Hungarian Notation doesn't Work
It used to be that when you registered a media type, a URI scheme, a HTTP header or another protocol element on the Internet, it was an opaque string that was a unique identifier, nothing more.
Sure, there are some substructures (e.g., vnd. and prs. in media types) to aid in avoiding collisions, but they don’t actually do anything. And even so, they need to be used judiciously (e.g., the problems inherent in x-).
However, it’s now becoming fashionable to hang specific behaviours off of prefix and suffixes.
For example, XmlHttpRequest gives special status to the Sec- prefix in HTTP headers; basically, if a very modern browser sees a header with it, they won’t allow it to be set using setRequestHeader().
That’s all fine and great for the folks who set up XHR; their use case has been met. However, what about the next lot that come along and want to give their headers a special prefix to do something cool? Do we come up with a weird Sec-Foo-Bar syntax?
In other words, using a prefix / suffix notation for your special use case is very workable the first — and only the first — time.
That’s not to say that XHR is alone. Many people assume that Content- is a special prefix in HTTP, and RFC2616 can be read as giving headers with that prefix special treatment in some circumstances; however, HTTPbis is removing that particular inference.
Likewise, once upon a time PEP (later RFC2774) devised a whole system of dynamic prefixes to allow distributed extensibility in HTTP. I’ve talked before about why this is philosophically flawed (and, indeed, evil), but it’s practically unworkable too, because the HTTP header registries don’t disallow people from registering new headers with those prefixes*. Luckily, exactly no one uses PEP**.
Most recently, HTML5 has defined a “web+” URI prefix to sandbox off a set of identifiers for Web applications. Again, it’s great for them, but what about for the next lot that want to put some semantic sauce into their URIs?
There’s a useful comparison to be made to a very similar syntactic convention in media types, +xml suffixes. This is actually being codified in the latest drafts of the media type registration procedure, so that you can register other formatting conventions, such as JSON, as +suffixes.
It’s easy to say “look, they’re using suffixes in media types, why can’t we use them in URIs too?” However, there are two crucial differences. First, the registration procedures are being updated to reflect the convention, and second, media type suffixes describe exactly one dimension, so there’s no potential for conflicting uses.
In other words, you’ll never have a case where two different +suffixes can both occur on a media type, because they’re defined to be mutually exclusive. application/foo+xml+json doesn’t make any sense. OTOH, unconstrained definition of these kinds of conventions can tie the hands of the whole Web without anyone even realising it, which is enough reason for caution.
All of this will become painfully obvious as soon as the second group comes along and designs a killer-app, must use prefix for HTTP headers or URI schemes.
Yes, I’m looking at you, browser folks. Good on you for moving first and winning the land grab, now let’s make sure we don’t all have to live with a mess for the next 30 years.
First, such conventions need to at least consider what the space of other values is. I’d argue that establishing a prefix for just one use case — even if it is huge, like Web browsing — is wasteful overkill, and should be avoided. The +suffix on media types makes sense, because having a formatting convention is a very common thing, and there’s real value in putting that information in an identifier.
While it might be pragmatic to stuff things into protocol elements, it’s not a long-term solution, and you have to think long term when you’re doing this stuff. OTOH, if your use case is important enough to justify the convention (and hey, XHR security might just be that use case), and there’s no other way to do it (I doubt…), maybe it’s the right thing to do. Which brings us to…
Second, if you’re going to establish a syntactic convention for a protocol element, it really really needs to be reflected in the registration procedures. Work with the appropriate people at the IETF; URI scheme registration procedures are already under revision, and there’s talk brewing for HTTP headers too. Shoving your convention down other peoples throats by shipping it first and asking questions never isn’t just anti-social, it’s actively counter-productive. Look how long it too to clean up the Cookie mess, after all.
So, in the case of Sec- headers, I’ve argued before that the benefits in browser maintenance don’t justify the overall system costs that this approach incurs. It’d be easier to have the browsers auto-update the list of sensitive headers from your servers (isn’t this the direction browsers are going in anyway?).
The web+ URI scheme is similarly unnecessary, from what I’ve seen so far. E.g., why not just define a single URI scheme for the sandbox and trigger application-specific behaviours on some other aspect of the request? I haven’t seen the use cases for this one yet, so maybe I’m missing something.
Doubtless this will all be ignored, because it’s already being baked into code. All I can do is plead with people to think more than a release ahead; we’re going to be using this stuff — especially URIs! — for a long time, so let’s not muck it up.
And let’s not blame everything on the browsers. Recently Julian discovered that RFC5825 had modified the message header registry, to give special status to anything starting with Downgraded-. Although this was intended to just cover e-mail, it briefly ended up applying to HTTP and NNTP headers too. Oops.
* Granted, the registry post-dates PEP, and in the spirit of full disclosure, I was one of the people who set up the registry. But still.
** Except, apparently, Julian.