JSON or XML: Just Decide

Friday, 13 April 2012

When people create HTTP APIs, one of the common decisions is about what format to use, usually revolving around “JSON or XML?”

The thinking often goes like this:

JSON is simple, easy to use, and “cool”; clients using dynamic languages will love it
BUT, many people (especially those using static languages) are invested in XML
So, I’ll just support both!

Unfortunately, it’s not that easy; just because HTTP allows you to negotiate for formats doesn’t mean it’s a good idea to use more than one.

Why?

1. Metamodels R Us

When you decide to support both JSON and XML formats for the same data, you have three choices:

Start with JSON, and map it to XML
Start with XML, map it to JSON
Create a new data model and map it to both

None is a good option. Starting with either of JSON and XML is going to create an abomination in the other format; a JSON-based model won’t use all of the power of XML and will feel “alien” to XML-friendly developers, and an XML-based model is going to be really verbose, ugly and non-JSONic as JSON.

This is because the underlying metamodels are fundamentally different. JSON is based upon a handful of common programming language data structures, and is inherently simple. XML, on the other hand, is based upon the Infoset, which is legendarily complex and hard to map to those data structures.

That leaves creating a new metamodel and mapping it to JSON and XML. This is time-consuming, tricky, and still isn’t likely to feel “native” in either of the formats; i.e., you’ll get the worst of both worlds.

There is actually a fourth option, followed by many: pretending this isn’t a problem, and defining separate JSON and XML message formats in parallel. This might seem like a good idea at first (especially after looking at the options above), but it catches up with you in the long term; little differences will appear between the serialisations, becoming magnified over time. It also makes your API harder to QA, perf test, support, document and understand.

Maintaining both and keeping alignment between them is a significant effort, not for the faint of heart.

2. Data Binding

It gets worse. When you give a programmer using a static language some data in XML format, the first thing they’ll usually do is to shove it into a “data binding” tool, which generates a set of objects that map to the XML, so the developer doesn’t have to dirty their hands with angle brackets.

That’s understandable; these programmers are used to having excellent tool support. However, when you change the underlying XML — for example, by adding an extension - you can unintentionally break any clients who are using such a data binding, because they’re using the XML Schema to inform their binding, and XML Schema is notoriously difficult to evolve.

The result of using XML with these clients, then is brittleness — one of the main things we look to using HTTP in a RESTful way to avoid, not encourage.

Furthermore, data binding tools often choose to expose the structure of XML in different ways, which means you’ll need to inform your use of XML with their practices to give these developers a good experience. In the worst possible world, they may place “hidden” dependencies on what you see as inconsequential structure in your document — e.g., the ordering of elements — causing interop problems as well.

What’s the Answer?

The solution is deceptively simple:

Use JSON for Web data formats.
Don’t produce or consume XML for data.
Provide excellent “client bindings” for Java, C# and other static languages as needed.

JSON has great language support, and for those that need it, your bindings can assure that a good programmer experience is had without losing control over how your API is consumed. Win.

Does this mean that everybody needs to provide Java/C#/etc. bindings for their HTTP APIs? Of course not; but, if someone thinks that they’re serving that crowd by offering JSON and XML, it’s a better alternative.

P.S.

Note the qualifier above; it’s “Web data formats”, not “all formats.” XML is good at markup — you know, documents, where you need things like attributes and PIs, maybe even namespaces. If you’re creating these documents, please go ahead and use XML. Also, if there’s already an XML format that’s designed just for your use case, go ahead and use it.

Die-hard XML heads will say that JSON isn’t as good for archiving as XML, and that’s true; however, for 99% of real-world cases, this isn’t a valid concern.

Also, this shouldn’t be read to say that JSON is perfect for data; only that it’s tons better to decide on one format and not try to support two.

19 Comments

http://openid.elfisk.dk/jornwildt said:

Mark, I have heard the comment about brittle XML serializers a few times now, but yet I haven’t experienced it in my own work, so I am curious of why XML (schema) is considered brittle?

I have been playing with .NET’s built-in XML serializer (the “old one”, not the new data annotations serializer). This one ignores unknown XML elements and attributes in the input stream and I don’t think property/element sequence must match, thus reducing the “brittleness” of object serialization.

I may have been lucky to only work with this, I may have overlooked or misunderstood something, or simply not seen relevant examples that would break the XML serialization.

Could you please clarify why it is that XML serialization is considered brittle? Thanks.

/Jørn Wildt

Friday, April 13 2012 at 3:18 AM

Mark underplank said:

I really like JSON of XML. I would use it any day of the week.

However after watching this presentation by http://www.youtube.com/watch?v=0PB_pO_jU38 steve klabnik, he points out that XML is much better for hypermedia than json, because it actually has the concept build in (kinda).

JSON and XML also dont have a concept of datetimes (that I know of) so I would be interested if others this this is an issue.

Friday, April 13 2012 at 5:35 AM

Andy Davies said:

I’ve preferred JSON over XML for a while…

To me the JSON structure is clearer whereas in XML you always seem to end up with the discussion as to whether you add an attribute, or an element etc.

Of course part of my dislike of XML is linked to SOAP which is so opaque when you’re actually trying to monitor it at the HTTP level (unless you’ve got something to peak inside the packets)

Friday, April 13 2012 at 6:22 AM

Jakub Nesetril said:

I agree that supporting both formats transparently is extremely tricky. XML was certainly entrenched before JSON even got it’s name, so the writing is on the wall - the industry is moving away from XML and towards JSON. However the transition is slower (and more painful) then it could be for the quantity of XML tools that just don’t have JSON equivalents at this point.

One thing we’re feeling particularly painful at Apiary right now is validators. There’s JSONSchema, but it’s got poor traction and poor expressivity—and that’s it. For larger API infrastructure this is painful. I’ve been at 2 projects where had to roll our own JSON validator framework. Sadly usually involving translation to XML as the first step.

Friday, April 13 2012 at 9:51 AM

Lee Dale said:

It’s simple, if you want to support both formats just use .Net’s WCF REST services. This allows you to specify web service endpoints that return both XML and JSON formats by specifying which format you want in the query string. You can pass back the same model object and allow WCF to handle the serialization to the correct format.

Friday, April 13 2012 at 9:54 AM

Christopher Ferris said:

The problem with JSON is that it isn’t good at things such as federated extensibility, which requires namespaces. Further, as is alluded in a comment above is that people try to do unnatural things with it, such as validation, which leads to some thinking it would be a good idea to have JSON Schema (oh noes! run for your lives!).

What about RDF?

Friday, April 13 2012 at 10:06 AM

matthewjosephtaylor.com said:

Excellent article. I completely agree with your stance.

XML in the day was far superior to random binary formats for EDI, and is still a good format for specifying a contract ala XML Schema that two partners who don’t trust each other 100% can work towards.

In most cases however when you either control both sides of the conversation, or if you have the freedom to dictate (or have a reasonable interlocutor) then JSON is the obvious choice for now.

Now if we could just do something about XML config files….

Saturday, April 14 2012 at 1:44 AM

Mark Nottingham said:

Jorn - The brittleness becomes a factor when you use several of these tools on the same XML, because they have different abstractions; i.e., they all interpret the “contract” differently. Also, it becomes an issue over time, as you evolve, again because of how the “contract” is interpreted. I’ll be writing more about this soon.

Mark - We need better support for links in JSON, yes. Stay tuned.

Lee - Saying that using a particular implementation solves interoperability and evolution problems doesn’t really convince me :)

Chris - I don’t buy that we need “federated extensibility.” See my response to Jorge along these lines (and subsequent thread) on the OpenStack list: http://www.mail-archive.com/openstack@lists.launchpad.net/msg09929.html

I like the RDF data model (and have written about that extensively in the past), but the syntax is horrible for developers (in XML; n3 / turtle is marginally better, but for whatever reason, it isn’t catching on broadly).

Saturday, April 14 2012 at 5:00 AM

Christopher Ferris said:

I agree that the RDF/XML syntax is an abomination. I much prefer n3 or turtle.

I think that XML gets a bad rap, not for inherent issues in XML itself, but because of XML Schema, in particular. XSD 1.1 is now a REC. It enables a better extensibility model, but only IFF there is also commensurate adoption and tooling developed to exploit the new features. I am not hopeful.

As for the comment about federated extensibility, I guess we’ll have to agree to disagree.

Saturday, April 14 2012 at 10:57 AM

Blair Zajac said:

Hi Mark, good article.

One question, why is XML better for archiving? I’m guessing one would also archive any XML Schema files which helps people understand the documents better years later, more so than JSON?

Blair

Saturday, April 14 2012 at 11:30 AM

Erik Wilde said:

i really like the metamodel part in section 1, and generally agree that defining and dealing with metamodels is non-trivial. but sometimes, you have to create metamodels anyway, because of the type of service you provide. for us, for example, our services allow users to extend the type system that we’re natively supporting, which means that they can use services that POST new types. allowing users to POST an XSD would be insanity, as maybe about 10 people on the planet can actually write good XSDs based on existing XSDs. so we’re pretty much bound to define our own metamodel, that allows users to define new types in that model world, which is much simpler than XSD. and now that we have metamodels and a mapping to XSD, adding one to JSON actually is not that much effort, and since many people like JSON and customers are asking for JSON, we very likely will end up defining rule-based mappings from our metamodel to XML and JSON. and in case we ever have actual customers asking for RDF, adding such a third route will not be all that difficult, either. the world we’re actually living in, though, will always be our metamodel, and why that approach takes a while to get right and certainly is too heavyweight for smaller and more specific applications, for services intending to be a long-living platform, i do think it makes sense.

Saturday, April 14 2012 at 11:48 AM

Mark Nottingham said:

Chris -

As I said, I like n3/turtle, but I fear it’s an upstream battle. The highly academic / abstract approach of some SW folks doesn’t help its case in developers’ minds, IME.

XSD1.1 is marginally better, but it’s still incredibly complex and not well-suited to most dynamic languages. E.g., how do I parse the date format in Python? It’s possible, but doesn’t come to hand easily. Most folks in that world don’t use Schema datatypes as the basis of communication, limiting the applicability of a schema.

Besides which, I suspect XML Schema had its turn with 1.0, and 1.1 is going to have trouble getting traction.

WRT federated extensibility - my main beef is with XML’s style of extensibility, and forced federation. There’s a time and place for federated extensibility, but it needs to be used judiciously; centralised extensibility has some good community effects if used thoughtfully.

Consider HTTP methods; if we’d had federated extensibility there, the Web would be a mess there. Even with centralised extensibility, some bad stuff got through (see MKCALENDAR).

The worst example of federated extensibility: WS-*, where it was used as a political football between vendors, rather than a technical mechanism to promote evolution. Standards are about agreement, and sometimes “federated” is a code word for “I think I can win because I’m big, not because I play with others.”

In short - federated extensibility is fine, but don’t just copy XMLNS’s approach, and make sure it’s used appropriately.

Sunday, April 15 2012 at 2:04 AM

Mark Nottingham said:

Blair -

As it is, XML is supposed to be self-describing, whereas JSON documents are often more ephemeral.

In reality, I don’t think there’s as much difference. It’s possible to have a long-lived, well-understood XML format (e.g., DocBook, XHTML, Atom), but there are also many ad hoc XML formats too.

It has as much to do with having a stable specification (read: standard, or at least well-established community around that format) than it has to do with having schema, etc.

Sunday, April 15 2012 at 2:08 AM

Noah Mendelsohn said:

I mostly agree with Mark. JSON tends to be adequate for simple “data” exchanges; XML tends to be better at dealing with structured text. The example I often give is: if an insurance company wants to publish a list of policy holders and the amounts of their coverage, JSON probably does fine, and is likely to be more convenient than XML. If that same company wants to publish the text of actual policies, or else to create a template that can be tailored to include just the right clauses for each particular policy, XML is likely to be a better bet. JSON doesn’t do mixed content well at all. In some cases, but not all, the use of XML for the templates might suggest using the XML for the data after all – not because it’s an easier way to do data, but because you then have a unified framework in which XML tooling or things like MarkLogic servers can be applied.

Also: the need for self-description and “versioning” support in JSON is reduced in some cases by the fact that often, but not always, the client code that interprets the data is dynamically downloaded from the same server that sources the data, and the two can up updated in sync. If you wanted to choose a format in which to publish an archival record of, say, a list of policy holders, I think it would be a closer call between JSON and XML. JSON remains easier to deal with, and his no impedance mismatch going into JavaScript, but it feels somewhat less self-describing as a long term storage format.

Sunday, April 15 2012 at 2:46 AM

Berta Blogger said:

I think the problem of choice vs coexistence is aggravated by the fact, that HTTP/1.1 has no support for client-driven content-negotiation. There is no standard way of asking for the supported mediatypes of a resource. Instead the client has to announce a preference using accept-* headers and the server does the choice for him (sever-driven negotiation). That saves one round-trip but destroys the self-descriptiveness of the REST/HTTP-Interface.

There is the OPTIONS method to find out about supported methods, the Vary header to express possible variants on a per-header basis, but the supported mediatypes of a given resource are at least as important - and undiscoverable without custom representations (like for instance 300 Multiple Choices -> JSON-doc with possible representations/mediatypes).

Have there been any thoughts in the HTTPbis group to include a new header like “Content-Type-Variants: application/json, application/xml”? It would enable developers to design self-describing REST/HTTP-APIs with support for multiple mediatypes - without assuming prior knowledge about the API.

Sunday, April 15 2012 at 3:11 AM

Yannick Loiseau said:

Sunday, April 15 2012 at 7:06 AM

https://me.yahoo.com/a/TexiesAZsNIThc_3YLLThR4ADxVB11WWgu_m#e8386 said:

My experience with JAXB was different to that of Julian. We came down on the side of XML. I guess I was an instance of your “big company” persona in that instance. I used schemas. Tooling was a big determining factor at the time.

Arguably, tooling hasn’t change significantly. Just as Java has 1st class tooling support, XML is pretty well supported …if you know what you are doing and are willing to invest. Javascript and JSON enjoy 2nd class support still, though that continues to change.

As for first class linking in the language? Well XLink didn’t seem like a good fit, which meant that you needed to be schema aware to know about the links…or understand the data format. But then it’s hard to escape having to understand the semantics of fields. Standardized formats only get you so far.

–Martin (openid still kinda poor)

Tuesday, April 17 2012 at 1:58 AM

Mark Nottingham said:

Berta -

I think that we’re going to see more link-driven APIs very soon, which can accommodate agent-driven negotiation. No need to use 300 or a new header, though.

Tuesday, April 17 2012 at 2:00 AM

Mark Nottingham said:

An older post from Tim Bray that’s topical: https://www.tbray.org/ongoing/When/200x/2009/04/29/Model-and-Syntax#p-1

Wednesday, April 25 2012 at 3:52 AM

mark nottingham

other XML posts