mnot’s blog

Design depends largely on constraints.” — Charles Eames

Friday, 25 November 2011

Linking in JSON

To be a full-fledged format on the Web, you need to support links -- something sorely missing in JSON, which many have noticed lately.

In fact, too many; everybody seems to be piling on with their own take on how a link should look in JSON. Rather than adding to the pile (just yet), I thought I'd look around a bit first.

What am I looking for? Primarily, a way to serialise typed links (as defined by RFC5988, "Web Linking") into JSON, just like they can be into Atom, HTTP headers and (depending on the HTML5 WG's mood today), HTML.

5988 isn't perfect by any stretch (primarily because it was an after-the-fact compromise), but it does sketch out a path for typed links to become a first-class, format-independent part of the Web -- as they well should be, since URIs are the most important leg holding up the Web.

My immediate use case is being able to generically pull links out of JSON documents so that I can "walk" an HTTP API, as alluded to previously.

I'm also going in with a bit of caution, because we have at least one proof that getting a generic linking convention to catch on is hard; see XLink.

Now, to the contenders.

JSON-LD: JSON for Linking Data

JSON-LD is a JSON format with a Linked Data (nee: Semantic Web) twist.

{
  "@context": "http://purl.org/jsonld/Person",
  "@subject": "http://dbpedia.org/resource/John_Lennon",
  "name": "John Lennon",
  "birthday": "10-09",
  "member": "http://dbpedia.org/resource/The_Beatles"
}

Obviously, if you want to fit RDF into JSON, this is what you'd be looking at. Which is great, but most of the developers in the world aren't (yet) interested in this (no matter how hard the proponents push for it!). It also fails to provide a mapping from 5988; where do I put the link relation type?

I've seen a fair bit of advocacy for JSON-LD, especially on Twitter, but in almost every instance, I've seen the non-believers push back.

JSON Reference

If JSON-LD is too complex / high-level, its opposite would be JSON Reference, a new-ish Internet-Draft by Chris Zyp and Paul Bryan (who are also working on JSON Schema, JSON Pointer and JSON PATCH, currently being discussed in the APPSAWG).

It's effectively a one-page spec, where a link looks like:

{ "$ref": "http://example.com/example.json#/foo/bar" }

This is effectively static serialisation of a type they've defined in JSON Schema, a sort of "meta-schema for links." I'd previously pushed back on that, because it effectively requires schema support / understanding to get the links out of the document -- a real non-starter in many scenarios.

So, I like the concreteness. However, it still lacks any way to talk about relation types, or pop on other metadata; while this could be grafted on separately, the whole point is to have one way to do it.

HAL - Hypertext Application Language

Another attempt is HAL, by Mike Kelly. The JSON portion of his serialisation looks like this:

{
  "_links": {
    "self": { "href": "/orders" },
    "next": { "href": "/orders?page=2" },
    "search": { "href": "/orders?id={order_id}" }
  }
}

Here, links are an object whose members are link relations (yay!).

I'm a bit concerned, however, that this object might be a bit awkward to drop into some formats; it relies on _links to identify the structure, so in some places, you'd need a two-level deep object to convey just a simple link.

Also, there doesn't appear to be any way to link to more than one URI of a given relation type.

What I'd Really Like

I don't have a specific thing in mind yet, and it's entirely possible that any of these proposals could be adapted to my needs (or others, I'm sure that they're out there).

I do have some requirements for consideration, though, along with a few sketches of ideas.

Discoverable

It should be really easy to find links in a document; as discussed above, requiring use of a schema is a non-starter.

This means that there needs to be some sort of marker the link data structure to trigger the link semantics (e.g., how JSON Reference uses "$ref"), and ideally some way to indicate on the document itself that it's using that convention (to avoid collisions, and help processors find these documents).

JSON Reference does this with a media type parameter;

Content-Type: application/json; profile=http://json-schema.org/json-ref

While at a glance that seems reasonable, I have two concerns; first, that JSON itself doesn't define a profile parameter on the media type (this needs to be done properly), and more seriously, that you can't declare conformance to multiple such conventions using this mechanism.

For example, if I want to say that my JSON conforms to this convention on links, and another convention about (say) namespaces, I'm out of luck.

I was in Boston recently (for the OpenStack design summit), and during a lull went up to the W3C offices to have lunch. Having this very much on the mind, I asked TimBL for his take, and we sketched out a sort of JSON metadata container, something like:

{
  "_meta": {
    "json-linking": null,
    "json-namespaces": ["foo", "bar", "baz"]
  },
  // … rest of doc here
}

The exact format and content isn't important here; the idea is having a controlled way (i.e., the keys would probably be in a registry somewhere, and/or URIs) of adding annotations to a JSON document about its contents.

This is not an envelope; it's just a document metadata container, much like HEAD in HTML. You could put links up there too, I suppose, if it'd help. The important part is that you'd know to look for one of *those* because the media type of the document (or one of its parameters, if we go that way) indicates it's in use.

What do people think? Is this JSONic enough (whatever that means)?

Self-Contained

As mentioned in the discussion of HAL above, the link convention needs to be easy to insert in a format, and not be too convoluted. This probably means an object that's "marked" with a particular, reserved key, much as JSON Reference does it. A list of links then becomes an array of objects, which seems pretty natural.

Mappable to RFC5988

Again, as discussed, a mapping to RFC5988 is important to me -- both so that links can be serialised in various formats, with reasonable fidelity, and so that we can pivot from talking about "RESTful" APIs in terms of URIs to talking about them in terms of formats and link relations, as Roy advocates:

A REST API should spend almost all of its descriptive effort in defining the media type(s) used for representing resources and driving application state, or in defining extended relation names and/or hypertext-enabled mark-up for existing standard media types. Any effort spent describing what methods to use on what URIs of interest should be entirely defined within the scope of the processing rules for a media type (and, in most cases, already defined by existing media types).

Extensible

The object defined needs to be explicitly extensible by the format it's in, so that link-specific metadata can be added. See the discussion of namespaces in JSON.

Anchorable

A complement to linking in JSON is linking to JSON. While JSON Pointer looks really promising in this regard, and is getting a fair amount of buzz, but I wonder if another mechanism, analogous to xml:id, is necessary.

The use case here is when you want to link to a specific object in a document that may be changing over time; a JSON pointer can be brittle in the face of such change, while a document-unique identifier is much more stable.

This makes a lot of sense in XML, which is primarily a document format. I'm not sure about whether the justification is as strong in JSON, which is primarily a data representation format, but it's worth talking about.

Just One, Please

Again, there's not much value in having fifteen ways to serialise a link in JSON; it will end up a pretty ugly mess.

this entry’s page (23 comments)

Tuesday, 25 October 2011

Web API Versioning Smackdown

A lot of bits have been used over on the OpenStack list recently about versioning the HTTP APIs they provide.

This over-long and rambling post summarises my current thoughts on the topic, both as background for that discussion, as well as for review in the wider community.

The Warm-up: Software vs. Web Versioning

Developers are used to software versioning; e.g., for every release, you bump an identifier. There are usually major versions, minor versions, and sometimes things like package identifiers.

This fine level of granularity is useful to both developers and users; each of these things has precise semantics that helps in figuring out compatibility and debugging.

For example, on my Fedora box, I can do:

cloud:~> yum -q list installed httpd
Installed Packages
httpd.x86_64    2.2.17-1.fc14    @updates

… and I’ll know that Apache httpd version 2.2.17 is installed, and it’s the first package of that version for Fedora 14.

This lets me know that any modules I want to use with the server will need to work with Apache 2.2; and, that if there are security bugs found in httpd 2.2.15, I’m safe. Furthermore, when I install software that depends upon Apache, it can specify a specific version — and even packaging — to require, so that if it wants to avoid specific bugs, or require specific features, it can.

These are good and useful things to use software versioning for; it’s evolved into best practice that’s pretty well-understood. See, for example, Fedora’s package versioning guidelines.

However, they don’t directly apply to versioning on the Web. While there are similar use cases — e.g., maintaining compatibility, enabling debugging, dependency control — the mechanisms are completely different.

For example, if you throw such a version identifier into your URI, like this:

http://api.example.com/v2.2.17-1.fc14/things/foo

then every time you make a minor change to your software, you’ll be minting an entire new set of resources on the Web;

http://api.example.com/v2.2.17-2.fc14/things/foo

Moreover, you’ll need to still support the old ones for old clients, so you’ll have a massive footprint of URIs to support. Now consider what this does to caches in the middle; they have to maintain duplicates of the same thing — because it’s unlikely that foo has changed, but it can’t be sure — and your cache hit rate goes down.

Likewise, anybody holding onto a link from the previous version of the API has to decide what to do with it going forward; while they can guess that there’ll be compatibility between the two versions, they can’t really be sure, and they’ll still need to be rewriting a bunch of APIs.

In other words, just sticking software versions into Web URL removes a lot of the value we get from using HTTP, and if you do this, you might as well be using a ‘dumb’ RPC protocol.

So what does work, on the Web?

The answer is that there is no one answer; there are lots of different mechanisms in HTTP to meet the goals that people have for versioning.

However, there is an underlying principle to almost any kind of of versioning on the Web; not breaking existing clients.

The reasoning is simple; once you publish a Web API, people are going to start writing software that relies upon it, and every time you introduce a change, you introduce the potential to break them. That means that changes have to happen in predictable and well-understood ways.

For example, if you start using the Foo HTTP header, you can’t change its semantics or syntax afterwards. Even fixing bugs in how it works can be tricky, because clients will start to work around the bugs, and when you change things, you break the workarounds.

In other words, good mechanisms are extensible, so that you can introduce change without wiping the slate clean, and it means that any change that doesn’t fit into an extension needs to use a new identifier, so it doesn’t confuse clients expecting the old behaviour.

So, if you want to change the semantics of that Foo header, you can either take advantage of extensibility (if it allows it; see the Cache-Control headers extensibility policy for a great example), or you have to introduce another header, e.g., Foo2.

This approach extends to lots of other things, whether they be media types, URI parameters, and potentially URIs themselves (see below).

Because of this, versioning is something that should not take place often, because every time you change a version identifier, you’re potentially orphaning clients who “speak” that language.

The fundamental principle is that you can’t break existing clients, because you don’t know what they implement, and you don’t control them. In doing so, you need to turn a backwards-incompatible change into a compatible one.

This implies that API versioning absolutely cannot be tied to software versioning in any way; doing so will needlessly limit (and often break) your clients, and generally upset people.

There’s an interesting effect to observe here, by the way; this approach to versioning is inherently non-linear. In other words, every time you mint a new identifier, you’re minting a fundamentally new thing, whether it be a HTTP header, a format identified by a media type, or a URI. you might as well use “foo” and “bar” as “v1” and “v2”. In some ways, that’s preferred, because people read so much into numbers (especially when there are decimal points involved).

The tricky part, as we’ll see in a bit, is what identifiers you nominate to pivot interoperability around.

An Aside: Debugging with Product Tokens

So, if you don’t put minor version information into URIs, media types and other identifiers, how do you debug when you have an implementation-specific problem? How do you track these minor changes?

HTTP’s answer to this is product tokens. The appear in things like the User-Agent, Server and Via headers, and allow software to identify itself, without surfacing minor versioning and packaging information into the protocols “core” identifiers (whether it’s a URI, a media type, a HTTP header, or whatever).

These sorts of versions are free — or even encouraged, delta the security considerations — to contain fine-grained identifiers for what version, package, etc. of software is running. It’s what they’re for.

The Main Event: Resource Versioning

All of that said, the question remains of how to manage change in your Web application’s interface. These changes can be divided into two rough categories; representation format changes and resource changes.

Representation format changes have been covered fairly well by others (e.g., Dave), and they’re both simple and maddeningly complex. In a nutshell, don’t make backwards-incompatible changes, and if you do, change the media type.

JSON makes this easier than XML, because it has both a simpler metamodel, as well as a default mustIgnore rule.

Resource changes are what I’m more interested in here. This is doing things like adding new methods, changing the URIs that clients use (including query parameters and their semantics), and so forth.

Again, many (if not most) changes to resources can be accommodated by turning them into backwards-compatible changes. For example, rather than bumping a version when you want to modify how a resource handles query parameters, you mint a new, sibling resource with a different name that takes the alternate query parameters.

However, there comes a time when you need to “wipe the slate clean.” Perhaps it’s because your API has become overburdened with such add-on resources, or you’ve got some new insights into your problem that benefit from a fresh sheet. Then, it’s time to introduce a new API version (which again, shouldn’t happen often). The question is, “how?”

In this Corner: URI Versioning

The most widely accepted way to do version resources of Web APIs currently is in the URI. A typical example might be:

http://api.example.com/v1/things/foo

Here, first path segment is a major version identifier, and when it changes, everything under it does as well. Therefore, the client needs to decide what version of the API it wants to interact with; there isn’t any correlation between URIs between v1 and v2, for example.

So, even if you have:

http://api.example.com/v2/things/foo

There isn’t necessarily any correlation between the two URIs. This is important, because it gives you that clean slate; if there were correlation between v1 and v2 URIs, you’d be tying your hands in terms of what you could do in v2 (and beyond).

You can see evidence of this in lots of popular Web APIs out there; e.g., Twitter and Yahoo.

However, it’s not necessary to have that version number in there. Consider Facebook; their so-called old REST API has been deprecated in favour of their new Graph API. Neither has “v1” or “v2” in them; rather, they just use the hostname to name space the different interfaces (“api.facebook.com” vs. “graph.facebook.com”). Old clients are still supported, and new clients can get new functionality; they just called their new version something less boring than “v2”.

Fundamentally, this is how the Web works, and there’s nothing wrong with this approach, whether you use “v1” and “v2” or “foo” and “bar” — although I think there’s less confusion inherent in the latter approach.

The Contender: HATEOS

However, there is one lingering concern that gets tied up into this; people assume — very reasonably — that when you document a set of URIs and ship them as a version of an interface, clients can count on those URIs being useful.

This violates a core REST principle called “Hypertext As The Engine of Application State”, or HATEOS for short.

RESTafarians have long searched for signs of HATEOS in Web APIs, and Roy has lamented its absence in the majority of them.

Tying your clients into a pre-set understanding of URIs tightly couples the client implementation to the server; in practice, this makes your interface fragile, because any change can inadvertently break things, and people tend to like to change URIs over time.

In a HATEOS approach to an API, you’d define everything in terms of media types (what formats your accept and produce) and link relations (how the resources producing those representations are related).

This means that your first interaction with an interface might look like this:

GET / HTTP/1.1
Host: api.example.com
Accept: application/vnd.example.link_templates+json

HTTP/1.1 200 OK
Content-Type: application/vnd.example.link_templates+json
Cache-Control: max-age=3600
Connection: close

{
  "account": "http://accounts.example.com/{account_id}",
  "server": "/servers/{server_id}",
  "image": "https://images.example.com/{image_id}"
}

Please don’t read too much into this representation; it’s just a sketch. The important thing is that the client uses information from the server to dynamically generate URIs at runtime, rather than baking them into the implementations.

All of the semantics are baked into those link relations — they should probably be URIs if they’re not registered, by the way — and in the formats produced. URIs are effectively semantic-free.

This gives a LOT of flexibility in the implementation; the client can choose which resources to use based upon the link relations it understands, and changes are introduced by adding new link relations, rather than new URIs (although that’s likely to be a side effect). The URIs in use are completely under control of the server, and can be arranged at will.

In this manner, you don’t need a different URI for your interface, ever, because the entry point is effectively used for agent-driven content negotiation.

The downsides? This approach requires clients to make requests to discover URIs, and not to take shortcuts. It’s therefore chatty — a fairly damning condemnation.

However, notice the all-important Cache-Control header in that response; it may be chatty without caching, but if the client caches, it’s not that bad at all.

The main issues with going HATEOS for your API, then, are the requirements it places upon clients. If client-side HTTP tools were more widely capable, this wouldn’t be a big deal, but currently you can only assume a very low-level, bare HTTP API without caching, so it does place a lot of responsibility on your client developer’s shoulders — not a good thing, since there are usually many more of them than there are server-side.

So, there are arguments for and against HATEOS, and one could say the trade-offs are somewhat balanced; both are at least reasoned positions. However, there’s one more thing…

Enter Extensibility

Extensibility and Versioning are the peanut butter and jelly of protocol engineering. Sure, my kids’ cohort in Australian primary schools are horrified by this combination, but stay with me.

OpenStack has an especially nasty extensibility problem; they allow vendors to add pretty much arbitrary things to the protocol, from new resources to new representations, as well as extensions inside their existing formats.

Allowing such freedom with “baked-in” URIs is hard. You have to carve out extension prefixes to avoid collisions, and then hope that that’s good enough. For example, what if an API uses URIs like this:

http://api.example.com/users/{userid}

and HP wants to add a new subresource to the users collection? Does it become

http://api.example.com/users/hp

? No, that’s bad, because then no userid can be “hp”, and special cases are evil, especially when they’re under the control of others.

You could do:

http://api.example.com/users/ext/hp

and special-case only one thing, “ext”, but that’s pretty nasty too, especially when you can still potentially add “hp” to any point in the URI tree.

Instead, if you take a HATEOS approach, you push extensibility into link relations, so that you have something like:

GET / HTTP/1.1
Host: api.example.com
Accept: application/vnd.example.link_templates+json

HTTP/1.1 200 OK
Content-Type: application/vnd.example.link_templates+json
Cache-Control: max-age=3600
Connection: close

{
  "users": "http://api.example.com/users/{userid}",
  "hp-user-stuff": "http://api.example.com/users/{userid}/stuff"
}

Now, the implementation has full control over the URIs used for extensions, and it’s responsible for avoiding collisions. All that HP (or anyone else wanting an extension) has to do is mint a new link relation type, and describe what it points to (using existing or new media types).

This isn’t the whole extensibility story, of course; format extensions are independent of URIs, for example. However, the freedom of extensibility that taking a HATEOS approach gives you is too good to pass up, in my estimation.

The key insight here, I think, is that URIs are used for so many things — persistent identifiers, cache keys, bases for relative resolution, bookmarks — that overloading them with versioning and extensibility information as well makes them worse for all of their various purposes. By pushing these concerns into link relations and media types using HATEOS, you end up with a flexible, future-proof system that can evolve in a controllable way, without giving up the benefits of using HTTP (never mind REST).

this entry’s page (13 comments)

Friday, 21 October 2011

Why ESI is Still Important, and How to Make it Better

More than ten years ago, I was working at Akamai and got involved in the specification of Edge Side Includes (ESI), sort of a templating language for intermediaries.

In that time, interest in ESI has grown, waned and been reborn. As far as I can tell, it's implemented not only by Akamai and Oracle (the main forces behind it), but also in Varnish, Squid, and lots of other places too.

Back then, I had a strong suspicion that it'd die because people would see it as locking them into Akamai (or some other vendor). Why, then, is this limited, funny, embarrassingly simple little templating language still around?

In a word, it's concurrency.

In the last couple of years, it's become hot to build massively scalable Web servers by re-thinking how they handle concurrency; often using asynchronous, non-blocking single-process servers, rather than threads or multiple processes.

The benefits of this approach have been known for a long time; way before Dan Kegel wrote the C10K page, Web proxy servers like Squid (and its predecessor, Harvest) were using this approach because it's the only sensible way to scale for them.

However, as folks are finding out when they use newer tools that implement these methods (e.g., Twisted, Node.JS), writing event-driven code is something you either love or hate. Many developers can't stand it, especially for debugging (personally, I love it, but that's just me).

So, ESI is a way to offer the massive concurrency of non-blocking, asynchronous servers in a way that's easy to digest. Since fetching a URI doesn't block, the only overhead is in stitching the page together, and you can control the overhead of that by limiting the language's capability.

This makes ESI a great tool for building highly scalable dynamic Web sites without writing and debugging new code. Win.

Making ESI Better

ESI is, as mentioned, more than a decade old, and the Web has changed a lot in the intervening time. Even putting that aside, ESI isn't exactly what we'd call Web-friendly. We can do better.

Over that time, I've had a number of thoughts about how to improve ESI as a language, which I've shared with some interested people privately. One of my back-burner projects has been to implement this, but I have to admit that this isn't going to happen soon, since I'm busy doing several other things.

Instead, I'm going to dump those ideas here, and hope someone runs with them. Here are a few:

The biggest single way I can see to improve ESI is to make it possible to source variables from a URI. In other words, it should be possible to fetch a URI, parse the response (probably in JSON), and then reference the data returned when evaluating the template.

This would enable some really exciting things. Because variables are now just state, you can do things like cache user preferences -- using plain old HTTP caching -- and have that state be local to where it's needed. When you update that state, it can be invalidated. ESI expressions now can have arbitrary, application-relevant input, instead of being limited to a few paltry request headers.

This could be what it looks like:

<esi:load name="user_prefs" src="http://prefs.example.com/{request.cookie.userid}"/>
<!-- … -->
<esi:include src="/{user_prefs.top_left_module}"/>

Here, you see some JSON being loaded into the user_prefs variable, form a URI that's templates using a cookie that identifies the user, to drive how the page loads. This is very similar to a set of techniques I discussed a while back for composing services "RESTfully", and it still works.

JSON also presents a way to clean up the variable model generally; instead of the random collection of variables, ESI 2.0 could instantiate a request object, with appropriate members like .method, .cookie, .headers, and so forth. It also brings about the possibility of making response attributes available as well, at least in the context of an include.

Going even further, JavaScript presents an opportunity to rally around a common, well-understood syntax for things like variable references, operators, and even common functions (e.g., string manipulation).

ESI:include desperately needs a timeout parameter, and a sensible means of specifying fallback content (probably as a child of the include element).

Deeper integration with HTTP is necessary; not only should it be possible to access arbitrary aspects of the incoming request, but it should be possible to affect more of the outgoing response; e.g., the status code. Likewise, finer-grained control over outgoing requests (generated by include as well as load) would be good (e.g., via attributes on the element).

There are lots of smaller, easier wins. Not requiring valid XML is an obvious one; integrating URI Templates is likewise a no-brainer. Cleaning up some of the cruft in the syntax would be nice; there are some elements that people just don't need in there (e.g., esi:inline, the alt attribute).

Anybody up for it?

this entry’s page (9 comments)

Wednesday, 12 October 2011

Thinking about Namespaces in JSON

Since joining Rackspace to help out with OpenStack, one of the hot topics of conversation I’ve been involved in has been extensibility and versioning.

I think most of my readers (yes, all six of you) are fairly familiar with, if not tired of (hi, Dave!) the various arguments and counter-arguments in this space. However, there is one new-ish bit; how to do distributed extensibility in JSON.

That’s because OpenStack’s API allows vendors to add extensions in various ways, in an uncoordinated fashion. And while that’s a well-understood (if still somewhat tricky) problem in XML, it hasn’t been approached at all in JSON, which has fast become the format of choice for data-bearing APIs.

JSON has a head start in that it embodies the mustIgnore rule; if you put extra data in a JSON document (for example, an extra property on an object), all implementations will just ignore it. Great. However, the problem comes in when multiple people want to extend a document, but avoid collisions.

For example, given this straw-man JSON document:

{
	"foo": "bar",
	"version": 1
}

and you both FooCorp and BarProject add a “widget” property, they’ll be fighting over who owns it. Bad luck.

So, some way to coordinate these parties and assure that they don’t conflict is necessary. In XML, this is done with Namespaces in XML, and so solutions to this problem are generally called Namespaces too, even though they don’t have to look or work the same way.

Prior Art

I’m not the first person to wonder in this direction, of course.

Yaron made the first proposal, as far as I can tell. His approach looks like this:

{
    "org.goland.schemas.projectFoo.specProposal" : {
    "title": "JSON Extensions",
    "author": { "firstName": "Yaron",
                "com.example.schemas.middleName":"Y",
                "org.goland.schemas.projectFoo.lastName": "Goland",
              }
    }
}

It’s sort of a Java-ish approach, based on the DNS like URIs, but without the syntactic awkwardness of putting URIs in JSON. he also states that there’s an implicit name space for descendants; e.g., here, “title” is also in the org.goland.schemas.projectFoo name space.

There was another proposal in the JSON-schema mailing list in 2008. It looks very, very similar to XML schemas, except that the namespaces, as far as I can figure out, are bound inside the schema itself, rather than the document. It seems to have been shot down, because it required schema parsing to be able to identify things; never a good idea, especially in the JSON world.

Some Observations

Starting with the obvious, I’d say that if you can use JSON without namespaces, you really, really should. In other words, if you really need distributed extensibility, you need something like namespaces, but for all other purposes, they should be avoided like the plague; they make it too complex, and simplicity is the name of the game in JSON.

A bit more subtly, I think this isn’t just a document-by-document decision, but an node-by-node one in the document. I.e., you should identify the specific places in a document that need extensibility and allow namespaces there, but they shouldn’t pollute the rest of the document, if they aren’t needed there.

I suppose what I’m saying is that namespaces should be a purely syntactic convention to avoid collisions where distributed extensibility is allowed, rather than some magical thing that allows you to uniquely and globally identify every bit of data in the document. I know that’s going to rile up some of the linked data and semweb folks, but we’re talking JSON here, not Turtle or RDF.

This implies that Yaron’s inheritance is unnecessary; the very fact that the “title” property is a member of “org.goland.schemas.projectFoo.specProposal” is sufficient to assure lack of collisions (unless he wants to allow extensibility at that level too, in which case they should be explicit at that level).

Another Straw-Man

Given all of that, I wonder if the problem can be simplified enough to make some progress. I think Yaron’s proposal makes a certain amount of sense, with a few modifications:

This would tweak Yaron’s sample to something like (assuming that a registry were used):

{
    "FOO.specProposal" : {
    "title": "JSON Extensions",
    "author": { "firstName": "Yaron",
                "EXAMPLE.middleName":"Y",
                "lastName": "Goland",
              }
    }
}

I like this because it’s not very painful, it doesn’t require schema to process, and it gets the job done; it allows distributed extensibility. The important thing is to stop looking at namespaces as something you should slather over your format like butter — more is better! — and start seeing them as a specialised tool that should only be used when it can do some good.

this entry’s page (11 comments)

Friday, 2 September 2011

RFC6266 and Content-Disposition

HTTPbis published RFC6266 a little while ago, but the work isn’t finished.

This is the RFC that clarifies how the Content-Disposition header is used in HTTP; in a nutshell, while basic file downloads worked OK, there wasn’t any broad interoperability between browsers for non-ASCII filenames.

Julian Reschke did the hard work of coming up with a test suite to find how how bad things were, filing bugs with browsers (such as here and here), and finally writing the draft that eventually became the RFC. Thanks again, Julian.

Since publication, it’s become apparent that browsers are indeed moving towards better interop. So, the next step was to start to publicise this interop so that people on the content side can take advantage of it.

As a result, we’ve come up with a page giving advice to people who produce Content-Disposition — especially Web frameworks.

There’s also a sample implementation for Node.JS called sweet, so you can see how it should work, and support for checking Content-Disposition in REDbot, so you can confirm that you’re doing it well.

So, if you’re involved in a Web framework, please have a look and publicise this within your community. We’ve already filed a request with Django; if your framework has a tracking URL, please add it to the producer advice wiki page.

this entry’s page (3 comments)