Distributed Hungarian Notation doesn't Work

Wednesday, 24 August 2011

It used to be that when you registered a media type, a URI scheme, a HTTP header or another protocol element on the Internet, it was an opaque string that was a unique identifier, nothing more.

Sure, there are some substructures (e.g., vnd. and prs. in media types) to aid in avoiding collisions, but they don’t actually do anything. And even so, they need to be used judiciously (e.g., the problems inherent in x-).

However, it’s now becoming fashionable to hang specific behaviours off of prefix and suffixes.

For example, XmlHttpRequest gives special status to the Sec- prefix in HTTP headers; basically, if a very modern browser sees a header with it, they won’t allow it to be set using setRequestHeader().

That’s all fine and great for the folks who set up XHR; their use case has been met. However, what about the next lot that come along and want to give their headers a special prefix to do something cool? Do we come up with a weird Sec-Foo-Bar syntax?

In other words, using a prefix / suffix notation for your special use case is very workable the first — and only the first — time.

That’s not to say that XHR is alone. Many people assume that Content- is a special prefix in HTTP, and RFC2616 can be read as giving headers with that prefix special treatment in some circumstances; however, HTTPbis is removing that particular inference.

Likewise, once upon a time PEP (later RFC2774) devised a whole system of dynamic prefixes to allow distributed extensibility in HTTP. I’ve talked before about why this is philosophically flawed (and, indeed, evil), but it’s practically unworkable too, because the HTTP header registries don’t disallow people from registering new headers with those prefixes*. Luckily, exactly no one uses PEP**.

Most recently, HTML5 has defined a “web+” URI prefix to sandbox off a set of identifiers for Web applications. Again, it’s great for them, but what about for the next lot that want to put some semantic sauce into their URIs?

There’s a useful comparison to be made to a very similar syntactic convention in media types, +xml suffixes. This is actually being codified in the latest drafts of the media type registration procedure, so that you can register other formatting conventions, such as JSON, as +suffixes.

It’s easy to say “look, they’re using suffixes in media types, why can’t we use them in URIs too?” However, there are two crucial differences. First, the registration procedures are being updated to reflect the convention, and second, media type suffixes describe exactly one dimension, so there’s no potential for conflicting uses.

In other words, you’ll never have a case where two different +suffixes can both occur on a media type, because they’re defined to be mutually exclusive. application/foo+xml+json doesn’t make any sense. OTOH, unconstrained definition of these kinds of conventions can tie the hands of the whole Web without anyone even realising it, which is enough reason for caution.

All of this will become painfully obvious as soon as the second group comes along and designs a killer-app, must use prefix for HTTP headers or URI schemes.

Some Guidelines

Yes, I’m looking at you, browser folks. Good on you for moving first and winning the land grab, now let’s make sure we don’t all have to live with a mess for the next 30 years.

First, such conventions need to at least consider what the space of other values is. I’d argue that establishing a prefix for just one use case — even if it is huge, like Web browsing — is wasteful overkill, and should be avoided. The +suffix on media types makes sense, because having a formatting convention is a very common thing, and there’s real value in putting that information in an identifier.

While it might be pragmatic to stuff things into protocol elements, it’s not a long-term solution, and you have to think long term when you’re doing this stuff. OTOH, if your use case is important enough to justify the convention (and hey, XHR security might just be that use case), and there’s no other way to do it (I doubt…), maybe it’s the right thing to do. Which brings us to…

Second, if you’re going to establish a syntactic convention for a protocol element, it really really needs to be reflected in the registration procedures. Work with the appropriate people at the IETF; URI scheme registration procedures are already under revision, and there’s talk brewing for HTTP headers too. Shoving your convention down other peoples throats by shipping it first and asking questions never isn’t just anti-social, it’s actively counter-productive. Look how long it too to clean up the Cookie mess, after all.

So, in the case of Sec- headers, I’ve argued before that the benefits in browser maintenance don’t justify the overall system costs that this approach incurs. It’d be easier to have the browsers auto-update the list of sensitive headers from your servers (isn’t this the direction browsers are going in anyway?).

The web+ URI scheme is similarly unnecessary, from what I’ve seen so far. E.g., why not just define a single URI scheme for the sandbox and trigger application-specific behaviours on some other aspect of the request? I haven’t seen the use cases for this one yet, so maybe I’m missing something.

Postscript

Doubtless this will all be ignored, because it’s already being baked into code. All I can do is plead with people to think more than a release ahead; we’re going to be using this stuff — especially URIs! — for a long time, so let’s not muck it up.

And let’s not blame everything on the browsers. Recently Julian discovered that RFC5825 had modified the message header registry, to give special status to anything starting with Downgraded-. Although this was intended to just cover e-mail, it briefly ended up applying to HTTP and NNTP headers too. Oops.

* Granted, the registry post-dates PEP, and in the spirit of full disclosure, I was one of the people who set up the registry. But still.

** Except, apparently, Julian.

4 Comments

https://me.yahoo.com/a/TexiesAZsNIThc_3YLLThR4ADxVB11WWgu_m#e8386 said:

Tagging like this implicitly partitions the namespace. When you make such a partition, the semantic benefits need to be very clear.

X- failed. Whether something is experimental or not has virtually zero semantic value. In other spaces, such as CSS names, the informal division is accompanied by rigorous strictures on its use, so that it doesn’t break so badly.

The Sec- tag has some semantic value, though it’s of extremely narrow applicability.

Originally in response to: https://plus.google.com/117348597427239540873/posts/64PbQ3oSRy3

Monday, August 29 2011 at 11:40 AM

Anne van Kesteren said:

Still waiting for your insightful alternative for the Sec- prefix because as we have said on the mailing list this can still be changed. https://lists.w3.org/Archives/Public/public-webapps/2011JanMar/0623.html

Tuesday, August 30 2011 at 6:21 AM

http://openid.open.ac.uk/oucu/jk5837 said:

Anne, Mark suggested simply a list of header names that would be considered security-sensitive, rather than a pattern-matching approach. I’d agree with Mark’s implied evaluation that the performance, memory and manageability hit would be negligible compared to the badness of the prefix precedent.

Mark, I’m a bit surprised that you seem to accept media types +xml; you may remember the same concerns as you voice here being raised around application/soap+xml (e.g. something like application/purchaseorder+soap+xml). I don’t remember your position at the time (if you were there then in the SOAP group and cared about this issue). Can it be that this particular suffix convention won on practicality grounds, in absence of people actually trying to do multiple prefixes? Or is there a deeper reason why +xml yes, but sec- no, that I missed?

Jacek Kopecky

Wednesday, August 31 2011 at 8:23 AM

Mark Nottingham said:

Hi Anne.

I guess the question is whether browsers can arrange to update the list of such headers themselves. AIUI much in the browser world is moving in this direction.

Wednesday, August 31 2011 at 12:53 PM

Mark Nottingham

other HTTP posts

other Standards posts