mnot’s blog

Design depends largely on constraints.” — Charles Eames

Wednesday, 12 October 2011

Thinking about Namespaces in JSON

Filed under: Protocol Design Web

Since joining Rackspace to help out with OpenStack, one of the hot topics of conversation I’ve been involved in has been extensibility and versioning.

I think most of my readers (yes, all six of you) are fairly familiar with, if not tired of (hi, Dave!) the various arguments and counter-arguments in this space. However, there is one new-ish bit; how to do distributed extensibility in JSON.

That’s because OpenStack’s API allows vendors to add extensions in various ways, in an uncoordinated fashion. And while that’s a well-understood (if still somewhat tricky) problem in XML, it hasn’t been approached at all in JSON, which has fast become the format of choice for data-bearing APIs.

JSON has a head start in that it embodies the mustIgnore rule; if you put extra data in a JSON document (for example, an extra property on an object), all implementations will just ignore it. Great. However, the problem comes in when multiple people want to extend a document, but avoid collisions.

For example, given this straw-man JSON document:

{
	"foo": "bar",
	"version": 1
}

and you both FooCorp and BarProject add a “widget” property, they’ll be fighting over who owns it. Bad luck.

So, some way to coordinate these parties and assure that they don’t conflict is necessary. In XML, this is done with Namespaces in XML, and so solutions to this problem are generally called Namespaces too, even though they don’t have to look or work the same way.

Prior Art

I’m not the first person to wonder in this direction, of course.

Yaron made the first proposal, as far as I can tell. His approach looks like this:

{
    "org.goland.schemas.projectFoo.specProposal" : {
    "title": "JSON Extensions",
    "author": { "firstName": "Yaron",
                "com.example.schemas.middleName":"Y",
                "org.goland.schemas.projectFoo.lastName": "Goland",
              }
    }
}

It’s sort of a Java-ish approach, based on the DNS like URIs, but without the syntactic awkwardness of putting URIs in JSON. he also states that there’s an implicit name space for descendants; e.g., here, “title” is also in the org.goland.schemas.projectFoo name space.

There was another proposal in the JSON-schema mailing list in 2008. It looks very, very similar to XML schemas, except that the namespaces, as far as I can figure out, are bound inside the schema itself, rather than the document. It seems to have been shot down, because it required schema parsing to be able to identify things; never a good idea, especially in the JSON world.

Some Observations

Starting with the obvious, I’d say that if you can use JSON without namespaces, you really, really should. In other words, if you really need distributed extensibility, you need something like namespaces, but for all other purposes, they should be avoided like the plague; they make it too complex, and simplicity is the name of the game in JSON.

A bit more subtly, I think this isn’t just a document-by-document decision, but an node-by-node one in the document. I.e., you should identify the specific places in a document that need extensibility and allow namespaces there, but they shouldn’t pollute the rest of the document, if they aren’t needed there.

I suppose what I’m saying is that namespaces should be a purely syntactic convention to avoid collisions where distributed extensibility is allowed, rather than some magical thing that allows you to uniquely and globally identify every bit of data in the document. I know that’s going to rile up some of the linked data and semweb folks, but we’re talking JSON here, not Turtle or RDF.

This implies that Yaron’s inheritance is unnecessary; the very fact that the “title” property is a member of “org.goland.schemas.projectFoo.specProposal” is sufficient to assure lack of collisions (unless he wants to allow extensibility at that level too, in which case they should be explicit at that level).

Another Straw-Man

Given all of that, I wonder if the problem can be simplified enough to make some progress. I think Yaron’s proposal makes a certain amount of sense, with a few modifications:

This would tweak Yaron’s sample to something like (assuming that a registry were used):

{
    "FOO.specProposal" : {
    "title": "JSON Extensions",
    "author": { "firstName": "Yaron",
                "EXAMPLE.middleName":"Y",
                "lastName": "Goland",
              }
    }
}

I like this because it’s not very painful, it doesn’t require schema to process, and it gets the job done; it allows distributed extensibility. The important thing is to stop looking at namespaces as something you should slather over your format like butter — more is better! — and start seeing them as a specialised tool that should only be used when it can do some good.


11 Comments

https://me.yahoo.com/a/TexiesAZsNIThc_3YLLThR4ADxVB11WWgu_m#e8386 said:

I think that this is something that you, as an owner of a JSON object definition, can do by making a simple promise.

For instance, “I promise to never use a ‘.’ in any of my names.”

That might be enough. “If you want to extend, use a name with a ‘.’ and fight amoungst yourselves.”

Though, as you described, if you want to go to the effort of establishing and maintaining a registry, partition off some space for that and request cooperation from those who want to use that part of the namespace.

You could further partition off some of the registry space for registry-free extension. For instance, one separator indicates that you are using my registry, two separators indicates a DNS-based extension name like “com.example.middleName”, or you might find a way to cram a URI in there somewhere instead “{http://example.com/}middleName”.

I should point out that there’s no need to force an extension with an object value to use your namespace for that object.

{ “title” : “JSON Extensions”, “foo.extension” : { “title” : “using my own definition of ‘title’” } }

The only drawback being that the extension then doesn’t gain the associated benefits of having a managed namespace. But if the extending object wants to play by your rules, then that’s probably better all ‘round.

Regarding your “no inheritance” conclusion: great! That would get f—-d up really quickly. Avoiding a reliance on schemas is a smart choice too.

p.s., I think that your examples are a little broken. Check out jsonlint.com.

Wednesday, October 12 2011 at 6:11 AM

Kenneth Falck said:

What about the clashing of dot as a namespace separator and as object attribute separator in many languages? I.e. that:

{“foo.bar”:”baz”} {“foo”:{“bar”:”baz”}}

Both logically become:

foo.bar.baz

Or possibly the first form cannot be directly accessed as an attribute at all. Seems like very inconvenient.

Wednesday, October 12 2011 at 8:39 AM

Patrick Mueller said:

Kenneth, there is no confusion there, at the programming level.

The first object is referenced as anObject[“foo.bar”]

The second object can be referenced as:

  • anObject.foo.bar
  • anObject[“foo”].bar
  • anObject[“foo”][“.bar”]
  • anObject.foo[“bar”]

There is a danger that humans will get it wrong though. Which might be a case for NOT using “.”. Howzabout “-“?

Wednesday, October 12 2011 at 9:43 AM

David Carver said:

You may be interested in a similar implementation for doing namespaces with JSON.

http://www.ebayopensource.org/wiki/display/TURMERICDOC100GA/JSON

The Turmeric SOA project at ebayopensource supports a variety of data formats, and having the ability to represent XML multi-namespaced documents as JSON was a key requirement.

Thursday, October 13 2011 at 1:09 AM

Manu Sporny said:

Mark, you really need to read up on JSON-LD - it addresses most, if not every single one, of your concerns. Here’s the main website:

http://json-ld.org/

And the latest draft of the spec:

http://json-ld.org/spec/latest/

Thursday, October 13 2011 at 3:26 AM

Manu Sporny said:

“I’m not looking for frameworks you have to buy into to use name spaces; more for lightweight conventions that JSON authors can use to avoid conflict, nothing more.”

Ah, but JSON-LD doesn’t require you to buy into a framework at all. You just need to specify a context (or set of contexts) and you’re golden. For example,

{ “@context”: [“http://purl.org/openstack/v1”, “http://example.com/mycontext”] “title”: “mnot’s blog”, “author”: “Mark Nottingham”, “myext”: “My extension” }

The power is in the context and you don’t have to use a JSON-LD aware /anything/ to use the markup above /and/ make it collision-free. If you want to see if there are any collisions, you can always put it through a JSON-LD processor… but you only have to do that once, and only if you care about collisions.

Thursday, October 13 2011 at 10:02 AM

paulehoffman said:

Manu: “You just need to specify a context (or set of contexts) and you’re golden” can also be read as “if you don’t understand all this goop, you’re hosed”.

I’m all for unmanaged namespaces that are signalled by a “.” that there is an agreement not to use in the document itself. An informal rule to prevent collisions is “use your company’s domain name plus something else” for the new name.

Friday, October 14 2011 at 8:24 AM

Manu Sporny said:

“You just need to specify a context (or set of contexts) and you’re golden” can also be read as “if you don’t understand all this goop, you’re hosed”.

What I mean is that as long as two applications are using the same context, all of the key-value pairs are guaranteed to mean the exact same thing. You don’t /need/ to understand JSON-LD any further, although doing so will help you understand /why/ they’re guaranteed to mean the exact same thing. Why do you think “you’re hosed” if you don’t understand “all this goop”?

JSON-LD was designed to be additive. That is, you take the JSON you already have and add a few bits into it to make the key-value pairs mean something - but to only the people that care about that sort of thing. Everybody else can continue to use JSON as they had been before.

Really, this boils down to if you want your JSON document w/ decentralized extensions to actually mean something to other people. If the answer is no, then use whatever convention you want… this would be just fine:

{ { “foo”: “bar” } }

This also works:

{ “com.rackspace.opencloud.experimental”: { “foo”: “bar” } }

However, if you want your document to have meaning, for your extensions to be re-used by a greater community, it would be smart of you to publish the vocabulary extensions somewhere on the Web and point to them in your data. Something like:

{ “@context”: [“http://opencloud.rackspace.com/v1”, “http://extensions.example.com/v2”] “foo”: “bar”, “fooext”: “baz” }

People usually learn from copying examples - so you just make this point: “If you want to use our extensions, make sure that your ‘@context’ has those two URLs in them.”

Unmanaged namespaces for decentralized extensibility is an awful idea if the purpose of the extension is to gather adoption and promote external use for that extension. Having an unmanaged, undocumented, unsupported namespace is a recipe for disaster, IMHO.

Mark Nottingham wrote: [versioning] hasn’t been approached at all in JSON, which has fast become the format of choice for data-bearing APIs.

That’s one of the first things we tackled (successfully) in JSON-LD. We do it through the context URL used in the JSON data. Here is how openstack could say “we are using the v1 openstack vocabulary for all of the key-value pairs in this object”:

{ “@context”: “http://opencloud.rackspace.com/v1”, “foo”: “bar” }

and when you want to switch to version 2, you do this:

{ “@context”: “http://opencloud.rackspace.com/v2”, “foo”: “bar” }

Mark Nottingham said: I’m not looking for frameworks you have to buy into to use name spaces; more for lightweight conventions that JSON authors can use to avoid conflict, nothing more.

There isn’t really anything to “buy into” here other than adding “@context” to your JSON data, is there?

To come at this from another direction - decentralized extensibility is a terrible idea if you don’t document the extension somewhere on the Web. It’s not good enough to just say - “this subtree of the JSON object is claimed by XYZ Ltd.” - that doesn’t help people to understand how to use the extension. Worse, they have no idea how to get to documentation on the extension. A Centralized registry is easy - but not ideal for two reasons 1) People have to know that the registry exists and where to find it and 2) it’s centralized. You want to let things grow organically.

We standards developers forget that it’s incredibly difficult, in that most developers feel extremely intimidated, to add stuff to these centralized registries. Even if they do exist, they are so difficult to actually find. For example - I know there is a Link Type registry, but I have no idea who is paying attention to which one - the XHTML Link Types, the HTML 5 Link Types, the IETF Link Types? I spend a great deal of time developing standards and even I don’t know which centralized registry is being used by a particular application from year to year. If there is no “correct” documented way to use the extension, it becomes an interoperability nightmare.

We did consider domain names, but found that a URL gave us three things - 1) It allowed people to put the link into a browser and get a human-readable description of the extension and 2) it provided an unambiguous identifier and 3) we could use tokens that mapped to IRIs, which is very compatible with JSON.

Take a look at the examples in the JSON-LD playground. You will find that the JSON data looks just like regular JSON to developers - no ugly DNS-style names, no centralized registry, but with full support for documented, decentralized extensibility.

http://json-ld.org/playground/

If the “@context” values look too scary, you can always hide them behind an IRI.

@mnot tweeted: @manusporny I’m pretty sure that my problem looks like a nail to your hammer, yes ;)

There is a reason every toolbox contains a hammer. :)

Saturday, October 15 2011 at 12:41 PM

Creative Commons