SOAP: Protocol or Format?

Wednesday, 30 June 2004

Way back when the XML Protocol Working Group started kicking around, Henrik and I had a long-running, low-level “discusssion” about whether SOAP was a protocol or a format.

Henrik won, and SOAP is known as a protocol* today (despite the fact that the ‘P’ no longer stands for anything).

I’m wishing I’d fought just a bit harder to keep the distinction.

What Makes SOAP A Protocol?

Although people call SOAP a protocol, it’s really more of a protocol construction toolkit; it gives you the ability to define your own protocols using a few common tools.

This includes a format — the SOAP Envelope — and its associated processing model, roles like “sender”, “receiver” and “intermediary”, and message exchange patterns. SOAP ties all of this information up into a “protocol binding” which is just a way of saying “here’s how you use the SOAP format in a certain way on top of an underlying protocol, like HTTP.”

Protocol bindings also expose little knobs and bits to turn on and off (known as Properties and Features) that allow you to tweak their settings; however, the binding doesn’t itself provide a way to communicate these settings to someone else.

To do that, you need a description format like WSDL. Although that acronym stands for “Web Services Description Language,” I think protocol description format is nearer the truth — WSDL is nothing but a way of writing down a protocol without going through the trouble of publishing an RFC or W3C Recommendation.

So far, so good, but there are two problems with this;

1. Unnecessary Linkage

In the current Web services world, this is all tied together; the available knobs and bits are catalogued in the SOAP protocol binding, so you can twist and flip them in the WSDL that describes your Web service.

This is cumbersome. To add something new — for example, gzip compression in HTTP — you have to standardize a new SOAP feature (or is it a property? I always get them mixed up) for gzip, give it a URI, and say how it relates to existing SOAP protocol bindings. Then, you’ve got to go and standardise how to describe that mechanism in WSDL descriptions, based on its idea of properties and features. Whew.

It gets worse. If you want to use gzip compression in HTTP with non-SOAP messages, you can’t reuse what you did before, because it was SOAP-specific. Instead, you have to go and come up with a completely different mechanism for a completely different WSDL binding and standardise that. Ouch.

To top it off, if you want to make sure that you can talk about gzip compression in HTTP whether or not you’re using SOAP, you have to come up with an “abstract feature” that talks about HTTP compression without respect to the underlying format. Is that enough specs for you?

2. Duplicitous Constraint

An even bigger problem is brought about because SOAP bindings duplicate and constrain a lot of the things that WSDL is capable of describing.

For example, WSDL binding sections should be perfectly capable of mapping abstract messages to concrete, protocol-specific messages on the wire dynamically; that’s what the binding section is supposed to do. Instead, they don’t, because SOAP protocol bindings do it for them already.

So, if you want to describe a new mapping of abstract onto concrete messages, you have to describe a new message exchange pattern in SOAP, get it standardised and adopted by all of the vendors and partners you care to interoperate with. This is what the Liberty Alliance had to do with its PAOS “reverse” binding of SOAP onto HTTP.

Considering SOAP as a Format

Contrast this with the simplicity of calling SOAP a format. There would be no SOAP protocol binding, just a format with a processing model.

If you wanted to transfer that format around using an existing protocol (like HTTP, SMTP, Jabber, etc.) you could; it’s just a format that the person receiving it (including intermediaries, potentially) knows how to work with.

If you want to get fancy and describe your interfaces and interchanges in WSDL, you could without any fuss; as long as your underlying protocols were able to ship around XML, the only intrusion of SOAP would be the fact that you’re using a particular format for your XML that has some implied processing semantics.

In other words, there would be no linkage, because SOAP is just another format, whose carriage in various protocols would be describe in exactly the same way as other formats. There would be no constraint upon the message exchange patterns and other things specific to a protocol binding, because there would be no protocol binding; all of the description details would be in the WSDL itself.

Making a Clean Break

Calling SOAP a protocol was a great way to bootstrap its early adoption; if it were just a format, all of the cool tooling and easy-to-understand use cases would have gone out the window.

However, now that we have WSDL, we effectively have two poorly-integrated ways to describe a SOAP-based protocol.

This is relatively easy to fix. WSDL needs only allow SOAP format-specific binding information (e.g., SOAP header processing, XML serialisation) to be layered onto an arbitrary, separate underlying protocol binding (e.g., HTTP, SMTP, jabber).

This would do away with the linkage and the constraints of the SOAP protocol binding, and get rid of the completely useless SOAP MEP information in the bargain.

Effectively, WSDL bindings would become descriptions of messaging pipelines, with each stage having the ability to affect serialisation and protocol mechanisms. For example, a SOAP pipeline would start with some abstract data models (e.g., Infosets) and message exchange patterns in the WSDL, put them through a SOAP pipeline component which adds headers and adapts the input infosets into SOAP Envelopes, and then hands it off to a HTTP pipeline component which serialises the Envelope Infoset as bits and takes care of the HTTP-specific protocol behaviours.

In pseudo-code, this would allow you to dictate an AbstractMessageInfoset’s serialisation like this:

SoapInfoset = SoapBinding(AbstractMessageInfoset, SoapBindingProperties)
WireFormat = HttpBinding(SoapInfoset, HttpBindingProperites)

as long as SoapBinding and HttpBinding had well-defined input and output formats (here, Infoset to Infoset and Infoset to bits, respectively).

“Protocol” is an oft-abused term, but in this context, I mean a wire protocol; an interaction between parties that comprises a message exchange using certain formats, with certain well-defined roles, etc.

Mark Nottingham

other Web Services posts