Structured URIs

Monday, 11 August 2003

I just found a draft finding that the W3C TAG published about a month ago, regarding the use of metadata in URIs. This is very cool, and I especially like the emphasis on authorities’ ability to embed metadata in URIs.

To me, though, it begs a big question. Right now, there isn’t any normal way for an authority to describe how that metadata is structured; one has to resort to more prosaic descriptions, which doesn’t really scale.

What I’ve wanted for a long time is a format to formally describe the structure of a URI, so that you can look at a URI and extract metadata about the resource from it, and conversely can take a handful of metadata and predicatively build a URI from it.

This is the first step towards allowing people to describe the interface exposed by a Web application in a machine-readable way.

There are already a few ways to do this, but none measure up. WSDL’s support for GET and PUT is pretty bad, but then again my dislike of WSDL is well known. HTML forms and XForms actually do approach this, but they are limited to query strings, which has encouraged a lot of bad practice out there.

What would such a format look like? Here’s my back-of-an-envelope answer, which models an address book application:

<URIStructure base="http://www.example.com/addressbook/" id="addressBook">
  <pathSegment name="people">
    <pathSegment type="http://foo.com/types#person"/>
      <pathSegment name="editForm"/>
    </pathSegment>
  </pathSegment>
  <pathSegment name="addForm"/>
  <pathSegment name="searchForm">
    <query id="searchTerms">
      <arg key="firstName" type="http://foo.com/types#givenName"/>
      <arg key="lastName" type="http://foo.com/types#sn" required="1"/>
    </query>
  </pathSegment>
</URIStructure>

The important thing to notice here is that you walk down the XML tree, finding the URI that interests you (either the one you’re trying to suck metadata out of, or the one you’re trying to build). Some components (e.g., the children of the “people” path segment, and the query arguments) don’t have declared names, but rather just syntactic typing, so that you know that with a “http;//foo.com/types#person” of “Bob”, the appropriate URI would be “http://www.example.com/people/Bob”, and that the “firstName” argument on searchForm must have a “http;//foo.com/types#givenName” value.

This is just a first step, mind you, but if the format were extensible, it would allow you to add metadata about the nature of the resources themselves, and how to use them. There’s a lot of fertile ground here; you could describe how they vary over time and other dimensions, what the semantic relationships of the resources are, what kind of RESTful interactions are available, and more mundane things like whether the query strings are case-sensitive, and whether they can be reordered.

More detail soon; I’ve still got some issues to work on, and a Python implementation to scratch together.

Mark Nottingham

other HTTP APIs posts

Structured URIs