Questions Leading to a Web Description Format

Friday, 29 April 2005

A while back, I published a series of entries ( 1, 2, 3, 4) about would-be Web Description Formats, with the intent of figuring out which (if any) is suitable, or whether a new one is required.

I’d like to keep this moving, by documenting some requirements for such a format. Although a few requirements and use cases did come up in that discussion, I’d like to work through a more general set of questions that need to be asked as a way of generating a realistic set of requirements. So, without further ado, feedback on the following brain dump would be most appreciated.

What is being described?

One of the first questions about a description format you’d have to answer is what are you describing? and there are a few aspects to it.

First of all, there’s the decision of what kinds of Web sites and resources are interesting. While the Web is currently thought of as just for human-to-machine interactions, more people are becoming interesting in using it for machine-to-machine communication as well. While there are use cases that description helps in all of these, it’s obvious that it’s most helpful in machine-to-machine communication, due to their limitations.

So, describing resources that are dedicated to machine-to-machine communication is a clear win. I also think that there’s value in describing some human-to-machine interactions, especially if microformats catch on like I hope they will. Mixing presentation with data isn’t a bad thing at all at this layer, for some applications, as long as the publisher understands that they have more of an audience than a HTML browser, and that changes to the format can break things.

That’s not to say that a description is necessarily useful for a traditional Web site; i.e., describing www.apple.com’s retail section wouldn’t be too useful, but parts of store.apple.com could be, as well as the more structured parts of support.apple.com could find one handy.

What’s the point of view?

Another aspect is what part of the system is being described. WSDL describes just the server interface, from the server’s point of view, and thereby has ducked a lot of hard questions about intermediaries. It also causes some creativity when describing more interesting client-side behaviours, as well as non-traditional message exchanges.

I’m not sure, but my intuition tells me this might be OK for a format that’s restricted to describing the Web. That’s because WSDL tries to describe a much lower-level abstraction — really, just bare messaging (a.k.a. SOA) — than the Web uses, with its established client/server RESTful model.

Do new-fangled things like AJAX or ARRESTED invalidate that? Not sure, but I think the basis of such a format has to be stateful resources; otherwise, you’re just WSDLing again. Describing the Web is describing resources; all behaviours are in relation to them, in some way.

What is a Web site?

Another good question is determining how a Web site is delineated; some people think of it as anything on a particular server (in URI-speak, with the same authority), while others are happy to call http://webhoster.example.com/~bob and http://webhoster.example.com/~mary different Web sites.

I’m inclined to take the latter view; I think the authority should be able to delegate control of subsections of its URI space to others, creating new logical sites that can be described as separate applications. Practically speaking, this is the only way to go; imagine if you had to describe the entire space of your Web server just to get the functionality of a stock quote script you installed.

How will the format be used?

The question why? will undoubtedly come up. This is a pretty big one; REST asserts that hypermedia is the engine of application state. Some will say that this means a description format is pointless, because the application should be dynamically walking through the representations presented by the server, following links as it goes along.

I don’t think that necessarily holds for machine-to-machine communication on the Web today (or at least until the Semantic Web Skynet is fully operational). In all seriousness, the Semantic Web may give us those capabilities, but we’re not there yet, and until we are, developers need to know the structure and parameters of a site ahead of time, so they can develop code.

Also, I see a description format as at least partially qualifying as hypermedia, as long as it’s URI-centric (not the case in WSDL, but just about required in a Web description format) and available on the Web.

As I said, I’ve talked about specific use cases for a Web description format before, but to recap, the big ones are:

Code Generation — It’s very useful to start with a description of the site, and then code by filling in the blanks. In the Web services world, this is referred to as ‘coding to the contract’ or ‘contract-first’ development, and it makes a lot of sense (although I think “contract” is needlessly legalistic, and misleading too; it implies a closed world, when in reality a description is very open, in that it’s always subject to additional information being discovered).

A couple of ways that this might manifest is through stub and skeleton generation, and auto-completion and other whizz-bang hinting in tools. Wouldn’t it be nice for Eclipse to give you a drop-down of the valid URIs that will give you a certain type when you’re coding?

One thing to keep in mind; generation of server-side code is an important use case, but clients and intermediaries shouldn’t be left out in the cold; while there’s only one server, there are countless clients, and I think intermediaries are going to play an ever-increasing role in the Web.

Dynamic Configuration — I’ve complained about the poor state of Web server configuration before, so I’ll spare you a repeat of the full polemic. A proper description format would be one mechanism to allow more transparent configuration of servers, and better use of the Web (and HTTP).

As before, this isn’t just about configuring servers. Clients can use hints in descriptions to optimise requests (rather than probing and waiting for error responses), and intermediaries can use descriptions to configure how they treat messages in both directions. This was the use case behind URISpace, which underlies how Akamai servers handle traffic.

(I do have some concern about the fuzzy line between a description format and a configuration format, because sometimes configuration needs to be done on a grand scale, and I’m not sure that a format that’s optimised for describing an interface in a very detailed way is also optimal for describing a Web site with 100,000 resources. That said, configuration is important for the small-scale use case too, so it has to be addressed somehow; it just might be that a separate configuration format using the same vocabulary is necessary.)

Application Modeling and Visualisation — Finally, there’s a considerable amount of value in having a standard representation (that’s intentional, folks) of a site’s layout and configuration; you can discuss it with peers, evolve it over time in a manner that’s independent to the implementation, develop tools to manipulate and visualise it, and so forth.

How much does it try to chew off?

Digging at a different angle into use cases, there are a few different aspects to a Web interaction that might need description.

Format — what is the structure of a representation? This could be anything from XML Schema to DTDs to RelaxNG to RDF Schema to a set of XPath expressions. I’ve talked about my preferences before, but I imagine that a good place for a description format to start would be towards the simpler end of the scale, with extensibility accommodating other solutions.

Protocol — what are the resources available, what representations to they accept/generate, what URI parameters do they take, etc? This is HTTP-centric, and necessarily so.

Coordination — a step up from that is how resources relate to each other; i.e., what state transfers / interactions need to take place to achieve a particular goal? This is very close to the choreography / business process space filled by BPEL and WS-Choreography, but it’s fundamentally different, because it’s state- and resource-oriented, rather than message- and process-oriented.

I think it’s necessary for a format to at least take a stab at coordination description, because the finer interface granularity that uniformity brings (due to having an identifier for every interesting thing) takes away the ability to hide complex semantics behind a single method call (unless it’s POST).

What protocol patterns and practices does it accommodate? Encourage?

In a perfect world, the format would be able to describe every Web site out there, to a fidelity that would allow developers to easily work with anything that had a description document.

The trouble with this is that it’s never possible to completely describe anything, because a description is nothing but a collection of things that people (or machines) think about something, and that’s an open set. So, we need to figure out what types of things we need to describe that are common enough on the Web to get enough bang for our specified buck.

One thing that’s easy to put on the list is REST, which pulls in things like generative identifiers, media types for dispatch, an extensible but small and uniform set of methods, and documents as representations of state. While not all (or even most) Web sites are strictly RESTful, a lot of the mechanisms of the Web are, and encouraging developers towards uniform semantics is worthwhile in and of itself; that’s what interoperability is all about.

Next on the list would be accommodation of common HTTP extensions like caching, authentication, WebDav, content negotiation and so forth. Once again, easy wins.

If that’s all the format were able to describe, much of today’s Web would be out in the cold. I’d argue that isn’t good, because while the format would (and should) encourage good practices, it would also shut most people out, and would fail to get adoption. The trick is to accommodate what’s existing, whilst making the best practices easiest. So, what else do we need?

Cookies are pretty prevalent, and while they’re not really good for the Web, a Web description format probably needs to at least have a hack to accommodate them, if not full-blown support. Some hacks will also need to be in place for query dispatchers (e.g., http://www.example.com/index.jsp?page=/foo/bar/baz.htm). I’m sure there are lots more such anti-patterns that will need to be supported to some degree.

Then there’s the issue of how to represent patterns like asynchrony and reliability. Doubtless there will be more than one way to do them, but the format should nudge people in one direction, at least.

I’d put forth that we don’t need to describe Web services, and I’d have to be seriously convinced before including SOAP support. As I’ve said before, I think SOAP makes a better protocol than a format, and as such doesn’t need special accommodation in a Web description format. If we start to accommodate SOAP as a layered protocol, we lose the benefits that having a constrained interface — REST — bring to the description format, let alone the Web.

Doubtless there are lots of other bad hacks and good patterns that should be accommodated and avoided (suggestions welcome). In the end, I suspect the best way to answer this question will be to look at existing sites and existing Web tools (e.g., Amazon, Google, Flickr APIs, PHP, Perl, Python, JSP, Struts, Cold Fusion) and see what patterns of usage they accommodate and encourage, and make a judgment call. At the same time, upgrading bad toolkits to promote good practices will help immeasurably.

Wrap-up

So, those are my current thoughts about the requirements for a Web description format. There are more, of course, but I wanted feedback about the general approach before diving into the specifics. Comments and more questions appreciated.

Mark Nottingham

other HTTP APIs posts