Modularity by reference

Thursday, 2 October 2003

Many XML-based formats could benefit from using references to promote modularity. For example, imagine a catalogue format;

<x:catalogue owner="Bob">
  <x:widget id="foo" name="FooWidget">
    <x:description>The Foo Widget</x:description>
  </x:widget>
  <x:widget id="bar" name="BarWidget">
    <x:description>The Bar Widget</x:description>
  </x:widget>
</x:catalogue>

The format’s designer wants to allow Mary’s catalogue to refer to items in Bob’s, so that she doesn’t have to redefine them.

There are a few ways to do this. XInclude is the first that comes to mind, but it has a problem. Imagine Mary’s catalogue with an XInclude reference;

<x:catalogue owner="Mary">
  <xi:include href="http://www.example.com/bob/catalogue#foo"/>
</x:catalogue>

This works fine if you run an XInclude processor over the entire document and use the output as a catalogue. However, imagine that Mary’s catalogue has not one, but thousands of entries in it, and it includes not just x:widgets, but also y:widgets and z:things. If Mary wants to know what x:widgets - and only x:widgets - are in her catalogue, she’ll have to dereference all of the XIncludes before knowing, because there’s no type information on the include. In other words, she can only process the document as a single unit, and that is a non-starter for many applications.

One way around this is to wrap the includes in elements that tell us what they contain; i.e.,

<x:catalogue owner="Mary">
  <x:widgetContainer>
    <xi:include href="http://www.example.com/bob/catalogue#foo"/>
  </x:widgetContainer>
</x:catalogue>

But this is cumbersome, to say the least.

An easy way to fix this would be to make the include an attribute, instead of an element; e.g.,

<x:catalogue owner="Mary">
  <x:widget xj:ref="http://www.example.com/bob/catalogue#foo"/>
</x:catalogue>

This way, we still know that the element is an x:widget, and the result after processing is:

<x:catalogue owner="Mary">
  <x:widget id="foo" name="FooWidget">
    <x:description>The Foo Widget</x:description>
  </x:widget>
</x:catalogue>

To make this work, we only need to follow two simple rules;

When the ref attribute is present, the element MUST NOT have any other attributes or children.
The ref attribute contains a URI, which MUST resolve to an element (using fragid syntax) with the same qualified name as the element that contains the ref attribute. That node replaces the element containing the ref attribute during processing.

As a bonus, as long as XML IDs are used for the references (and declared in the DTD) it’s this easy to implement in XSLT;

<xsl:template match="x:widget" xmlns:xj="http://ns.mnot.net/xj/01">
 <xsl:choose>
  <xsl:when test="@xj:ref">
   <xsl:apply-templates select="document(@xj:ref, document(saxon:base-uri()))"/>
  </xsl:when>
  <xsl:otherwise>
   <!-- processing for x:widget here -->
  </xsl:otherwise>
 </xsl:choose>
</xsl:template>

This does require the use of an undocumented (and very useful) Saxon extension. I’m curious as to why the XSLT folks made URIs so useless in the base spec…

UPDATE: After writing this, I realized that you can get almost the same magic without saxon:base-uri() by doing document(@xj:ref, /), but you don’t get xml:base (which wasn’t really what I was looking for anyway; I just found it strange that document() will use the stylesheet instead of the instance as the base if you don’t specify a second argument).

Mark Nottingham

Modularity by reference