Wednesday, 5 May 2004
Without pointing fingers, some people have a bee in their collective bonnet about the dangers of allowing binary content to be represented in XML, care of XOP. Others are up in arms about re-inventing HTTP in SOAP, courtesy of the Representation Header. Both of these are products of the XML Protocol WG, of which I’m a member, so I’d like to share my viewpoint (which is not that of either my employer nor the working group, etc., ad nauseam).
XOP is an alternate encoding of the XML Infoset — i.e., elements, attributes and whatnot — that makes it easier to handle and ship around certain kinds of content, namely that which has been encoded with the base64 algorithm. For more information about XOP, see a previous entry about it.
This seems innocuous enough, but some people claim that by doing so, we’re allowing proprietary formats to rule the day once again; all of the transparency and nifty markup tools that XML gives you go away, and Evil Vendors will smite us with their Terrible, Proprietary Formats.
Which, I say, is hogwash. Consider this XML:
<foo> <a>1</a> <b>bob</b> <c>878</c> </foo>
Can you claim to know what it represents? I sure can’t, because I pulled it from thin air. No one could, because it’s already possible to put opaque data into XML, which is what I’ve done here. Now consider this markup:
<bar> 1;bob;878 </bar>
Same data, different encoding. You can’t do much with this at all, because — gasp — there’s structure in the content that isn’t reflected in the XML! Once again, possible today, even without XOP!
XML also allows encoded binary data; indeed, XML Schema encourages it;
The only thing that XOP does, from the what-you-can-do-with-XML standpoint, is make it a bit more realistic to use base64-encoded data like this. While some people imagine that Big Evil Companies will use this as a lever to do all sorts of unspeakable things to Shy, Innocent and Virginal XML, in fact the motivation is much more mundane; people want to leverage legacy formats like PDF, GIF, JPEG and similar things, while still using an XML model. Until Moore’s Law catches up with these use cases so that we either can defined JPEG-in-XML or just don’t care about the overhead of encoding, something like XOP will be necessary.
Now, you can argue that having alternate representations of XML dilutes its power — a problem that David Orchard and I cautioned against at the Binary XML Workshop — and I’ll acknowledge that it’s a real concern. XOP seems to hit a good 80/20 point, though, because it’s still human-readable, leverages well-known and widely-implemented specifications, and doesn’t try to be all things to all people.
Another argument you can make against this approach is that XML isn’t a good general-purpose packaging mechanism, and I agree that it’s a stretch. However, the industry has chosen to rally around XML as a way to describe data in their applications, and XOP does reuse MIME for the actual packaging on the wire. XOP just gives people a way to think about MIME messages in a data model that they’re comfortable with and already using for a lot of other data.
The Representation Header
Here’s the scoop on the now-infamous Representation header. It’s not trying to duplicate HTTP in SOAP; it’s trying to duplicate MIME in SOAP. Why? Well, before XOP and MTOM there was SOAP Messages with Attachments (SwA). People started to build a lot of software using SwA, and as a result had to model applications as an XML message + attachments.
In the end, it turned out that this was a horrible mess, which was one of the driving forces behind XOP. Unfortunately, people had already built their apps with an attachments approach in mind, and vendors had built tools anticipating them. So, to keep these people happy and show them a migration path, we needed a means of representing an attachment in a XOP-encoded SOAP message; hence, the Representation Header.
That’s it. No evil plan to replace HTTP, at least in this round. Personally, I’d be just as happy to forget the Representation header, as I agree there are too many specs, and think that early adopters need to expect some pain. Other people really wanted it, though, and I didn’t imagine it would cause this much kerfuffle. Sheesh.
Final Thought (with added heresy)
The real question here — and boy, is this the elephant in the virtual room — is whether XML is the best
way to model data . This is something I’ve been musing on lately, and I’ll get back to you soon with some preliminary results.