mark nottingham

XQuery on the Web

Monday, 12 January 2004

XML

There’s a lot of interest out there about exposing XQuery 1.0 / XPath 1.0 / XPath 2.0 in Web interfaces. On the face of it, this is quite a compelling idea; it allows you to reuse a generic query mechanism (goodness) to access arbitrary data based on the client’s needs (more goodness) and only the bits of data that you want go across the wire (yet more goodness).

However, as many have noted, there’s a security problem; if you let someone execute arbitrarily complex code on your server, they can bring about a Denial of Service. Additionally, it’s very difficult for you to properly size your server(s), because one day you might be serving the easiest queries in the world, but the next people could be asking for stuff that borders on the insane.

Jon Udell seems to think that can be overcome technologically; I’m not sure I agree. I see this as the same sort of problem that plagued Web caching — a misalignment of interests; while the server is resource-conscious, there’s no reason for the client to be, and the client is calling the shots if you let it perform arbitrary queries.

This leads me to believe that it’s better to perform the query on the client. If you give the client a hunk of XML, they get benefits number one (generic query mechanism) and two (access to arbitrary data), and we only need to solve number three, the performance issue.

Dave Orchard wants to find a way to limit the query in some fashion, and I think he’s on the right track. I’d do it by deciding what views of the data I want to expose, and then offer each one up on a separate URI; For example, I could have a WidgetsView and PartNumberView and a SalesSummaryView, all based on the same dataset. This allows people to get at the data in several different ways while still being able to refine their queries on the client side, at no danger to the server.

I could also allow parameters on each of those resources, so that they could be queried in limited ways; e.g., list the first five items, list items in my account, etc.

“But, but, but,” I hear you saying, “that’s no better than the Web is now!” That’s true. However, it’s not really that different from a constrained XQuery language, and I think it’s better than a one-size-fits-all solution. By allowing a publisher to choose and control what kind of processing they’re willing to perform, and then exposing that as a collection of URIs, they can talk about those URIs (given a suitable Web description language) and inform clients about how to get the data they want.

Another approach would be to come up with a restricted version of XPath 2.0 and allow that on URIs. I have a suspicion, though, that enabling it to be dynamic based on the publishers’ wishes would make it pretty difficult to describe or use.

One other avenue that shouldn’t be overlooked is HTTP Delta Encoding, which obviates a lot of the concern about frequent updates of very large representations.