mnot’s blog

Design depends largely on constraints.” — Charles Eames

Monday, 11 August 2003

Structured URIs

Filed under: Web

I just found a draft finding that the W3C TAG published about a month ago, regarding the use of metadata in URIs. This is very cool, and I especially like the emphasis on authorities’ ability to embed metadata in URIs.

To me, though, it begs a big question. Right now, there isn’t any normal way for an authority to describe how that metadata is structured; one has to resort to more prosaic descriptions, which doesn’t really scale.

What I’ve wanted for a long time is a format to formally describe the structure of a URI, so that you can look at a URI and extract metadata about the resource from it, and conversely can take a handful of metadata and predicatively build a URI from it.

This is the first step towards allowing people to describe the interface exposed by a Web application in a machine-readable way.

There are already a few ways to do this, but none measure up. WSDL’s support for GET and PUT is pretty bad, but then again my dislike of WSDL is well known. HTML forms and XForms actually do approach this, but they are limited to query strings, which has encouraged a lot of bad practice out there.

What would such a format look like? Here’s my back-of-an-envelope answer, which models an address book application:

<URIStructure base="http://www.example.com/addressbook/" id="addressBook">
  <pathSegment name="people">
    <pathSegment type="http://foo.com/types#person"/>
      <pathSegment name="editForm"/>
    </pathSegment>
  </pathSegment>
  <pathSegment name="addForm"/>
  <pathSegment name="searchForm">
    <query id="searchTerms">
      <arg key="firstName" type="http://foo.com/types#givenName"/>
      <arg key="lastName" type="http://foo.com/types#sn" required="1"/>
    </query>
  </pathSegment>
</URIStructure>

The important thing to notice here is that you walk down the XML tree, finding the URI that interests you (either the one you’re trying to suck metadata out of, or the one you’re trying to build). Some components (e.g., the children of the “people” path segment, and the query arguments) don’t have declared names, but rather just syntactic typing, so that you know that with a “http;//foo.com/types#person” of “Bob”, the appropriate URI would be “http://www.example.com/people/Bob”, and that the “firstName” argument on searchForm must have a “http;//foo.com/types#givenName” value.

This is just a first step, mind you, but if the format were extensible, it would allow you to add metadata about the nature of the resources themselves, and how to use them. There’s a lot of fertile ground here; you could describe how they vary over time and other dimensions, what the semantic relationships of the resources are, what kind of RESTful interactions are available, and more mundane things like whether the query strings are case-sensitive, and whether they can be reordered.

More detail soon; I’ve still got some issues to work on, and a Python implementation to scratch together.


1 Comment

Eamonn Neylon said:

Interesting the power of the W3C’s marketing - just by writing something about metadata gets the world’s attention! There is a group of library information specialists (and some publishers) who have been working in this area, on the OpenURL Framework, for several years. Initially convceived as a way of transporting metadata to a service endpoint using a URL, the framework has grown and been generalized into a means of transporting contextualized metadata both inline and through by-reference mechanisms. The work is being standardized through NISO Committee AX and represents a serious effort to provide a means of structuring metadata into functional packages for use in general applications.

Thursday, August 21 2003 at 12:37 PM

Creative Commons