mark nottingham

JSON and XML

Monday, 24 January 2005

XML

I’m intrigued by the JSON effort. While many people (and vendors) have chosen XML for data interchange because it’s not platform- or vendor-specific, these folks have chosen the other path; by leveraging the serialisation of data structures in ECMAScript (nee JavaScript) — a nearly ubiquitous language, on every desktop that has a browser — they get an automatic installed base and at least one API for free.

Then, by defining mappings to other languages (e.g., Java, Perl and C#; by coincidence or design, Python doesn’t require anything extra), they suddenly get a data interchange format that’s pretty darn useful for what’s becoming a very common task — turning those browsers into an application platform.

Some XML people will scoff, whilst others will have fear in their eyes; as discussed before, XML isn’t so great for data modeling.

It Always *Starts* Simple…

So, will XML be a distant memory in a few years? Will world-wide inventories of angle brackets shoot up thanks to JSON? Not quite. While on the face, it’s a very attractive solution, I have a feeling JSON is going to run into a few problems.

First of all, it’s still a tree; there isn’t any way to represent a graph in JSON. This isn’t a big loss over XML — also a tree — but it does present a problem in some situations. It would be better if a reference mechanism were built in.

More seriously, JSON also doesn’t have a language-neutral schema mechanism; while you might be able to describe something in prose, or in a language-specific way, it would be really nice to be able to validate data and generate code, and it’s critical to have well-described interfaces.

Next, JSON’s type system is fairly limited; for example, there are no time or date types. You can say that it’s an integer offset from an epoch, but then you get into implementation-specific concerns. Whoops.

All of these problems can be addressed by extending JSON, which leads us to our final issue; JSON doesn’t have any mechanism for extension or versioning. In other words, how does one change the data structure you’re pushing across the wire over time, whilst still remaining compatible with processors that are expecting or generating the old version? How do you disambiguate fourteen different “item” structures when you want to combine different data sources? For bonus points, what do you do when one of those extensions isn’t valid JavaScript?

Don’t Get Cocky, XML

That’s not to say that XML is so hot either; while these problems have been recognised, XML can’t represent a graph, XML Schema is not exactly user-friendly or even implementer-understandable, and while the XML Schema type system is pretty good, extending and versioning XML is still dangerous territory, and Namespaces in XML are trickier than they appear.

So, although JSON clearly has shortcomings and limitations, XML shares some of them, and extracts an arguably high tax for those it doesn’t.

Considering that it’s tangentially associated with oh-so-cool technologies like GMail and Google Suggest, and is a sop to the XML-is-too-slow-and-bloated contingent, I wouldn’t be surprised if, once mature, JSON takes a bite out of a lot of the “low-end” (translate: non-enterprise) projects out there, because XML will fail to justify its cost in non-markup applications. In short, some developers won’t care about the limitations above, because they don’t think they’ll push the envelope that much, or if they do, they can fudge it. Fair enough.

That said, right now I still think of JSON more as an expression of frustration with XML for data modeling and representation than the ultimate solution; while it’s extremely attractive to couple it closely with existing languages, more is required to interchange data robustly between distributed systems. YMMV.

P.S.

The “O” stands for “Object.” Here we go again…


16 Comments

Dimitre Novatchev said:

Dimitre,

You can represent a graph with XML + something else; for example, RDF/XML allows you to represent a graph. However, the data model of the Infoset is a tree, not a graph; you can’t validate the graph using XML Schema without extra machinery.

This is the problem that SOAP Encoding came up against, and the main reason why we dropped it in the WS-I Basic Profile.

There’s no extra machinery in the case of XGMML – the xml schema defines the vocabulary and it is used to represent graphs. Which is the “extra machinery” in this case?

I have used XGMML in several XSLT applications for some quite generic and complex processing, such as Eulerisation of a graph to be used in a “Chinese Postman” - type of algorithm. In my experience using XGMML has been natural and convenient.

Dimitre.

Tuesday, January 25 2005 at 1:08 AM

Christian Metts said:

Another thing to keep an eye on is YAML ( http://yaml.org ) it doesn’t have the advantage of speaking natively with javascript but it doesn’t have the type system limitations of JSON.

Tuesday, January 25 2005 at 1:45 AM

Dimitre Novatchev said:

XML can’t represent a graph

Quite the opposite:

http://www.cs.rpi.edu/~puninj/XGMML/draft-xgmml.html

Probably it’s a good idea to conduct some research before making absolute/extreme statements.

Dimitre Novatchev.

Tuesday, January 25 2005 at 2:31 AM

Geoff said:

Dimitre,

Any data type can represent any other data type with a proper encoding. A string of tally-marks could represent a graph as such: “11111111111111111111111111111111111” but that’s not very easy to use by machine or human, is it? Graphs are a generalization of trees, and have the simplest encodings, as far as I know, for all other structures.

Tuesday, January 25 2005 at 9:39 AM

bryan said:

‘You can represent a graph with XML + something else; for example, RDF/XML allows you to represent a graph.’ i’m sorry, but RDF/XML is no XML + something else, it’s just xml. a very ugly xml dialect for representing a graph true, but that is all.

Tuesday, January 25 2005 at 11:09 AM

Mark Baker said:

RDF/XML is more than just XML, it’s XML + a bunch of new stuff to agree upon, like rdf:about, rdf:parseType, semantics for what it means to have one element contained within another, extensibility semantics, etc..

Tuesday, January 25 2005 at 12:46 PM

Fredrik said:

“Python doesn’t require anything extra”

That’s not entirely true; both the comment syntax and the string literal syntax differs from Python’s. To reliably generate and parse JSON, you need a library. It does not necessarily have to be a large library, but the builtins (repr/eval) don’t cut it.

The spec seems to be somewhat lacking in the êncödìng depärtmënt äs wéll, but that might just be me…

Wednesday, January 26 2005 at 9:01 AM

James said:

An advantage of json over xml is that, for me at least, the client code will be javascript looking for structured data. In some cases the client will be fetching content for (nearly) direct rendering, and XML seems to work better there.

But I would prefer to avoid having to do XML DOM tricks to navigate over an XML doc as part of some selective logic on a data set. Why parse a DOM to get JavaScript objects when I can just get the serialized objects directly?

In those cases where the client is JavaScript then the limited data types and such are not an issue. In more complex cases I prefer not to couple the data with formal types anyway. Just give me text (probably XML). If typing is important, then I’d use YAML.

Tuesday, February 22 2005 at 8:30 AM

Duncan said:

The issue with JSON and Python isn’t just that the syntax doesn’t quite match what Python would eval. Much more importantly you would have to be certifiably crazy to use eval on a string sent from an untrusted web client. Fortunately googling for JSON & Python finds http://json-rpc.org/pyjsonrpc/index.xhtml which does the parsing the hard (and probably safe) way.

Wednesday, March 16 2005 at 8:17 AM

Tom said:

I think the point that is being missed is in the AJAX scenario the XML needs to be processed into the object with a dtd and a lot of javascript The JSON just needs to be evaled into the object.

Monday, December 19 2005 at 5:55 AM

Marc said:

I just posted similar thoughts.

http://marc.abramowitz.info/archives/2006/01/05/json-a-light-weight-alternative-to-xml/

I agree that JSON won’t replace XML, but that JSON works at least as well as XML for a lot of common things like AJAX. By the way, YAML, it turns out, is a superset of JSON, which is kind of cool, because any YAML parser should be able to handle JSON as well. YAML seems to be popular with the Ruby crowd and is used in Ruby on Rails.

Thursday, January 5 2006 at 11:40 AM

Rich said:

The delivery package of choice depends on the problem at hand. We have a large RSS-type feed which contains some 600 records. I converted this to JSON and the size went from ~400k characters to ~370k characters (not much of a savings). Additionally, since you have to ‘eval’ the JSON string into a JS object for DHTML type purposes a string of this size crashed the browser. The browser chews on the 400k character XML file just fine. I am probably missing something but interesting observations none-the-less.

Wednesday, February 15 2006 at 1:25 AM

Sean Hagen said:

I just discovered JSON a few days ago, and have been toying around with it and the PHP json_encode() function. Combined with sajax ( a PHP library for easily exporting PHP functions to AJAX ), it makes doing all sorts of things a lot easier.

That said, there are a few limitations, as you said. With the combination I’m using, since you have control over the implementation, it isn’t a very big deal. But in larger projects, I can see how the implementation of things like dates and such could pose a problem.

Wednesday, February 13 2008 at 6:36 AM

Haris said:

Your artical is fairly good, But i am still confused about JSON and XML. Your you explain it more clearly

Sunday, August 23 2009 at 6:55 AM