mark nottingham

sparta.py 0.4: Data Binding for RDF in Python

Saturday, 15 May 2004

Semantic Web

After a short pause (OK, nearly three years), I’ve released version 0.4 of sparta.py.

Sparta is a simple API for RDF that binds RDF nodes to Python objects and RDF arcs to attributes of those Python objects. As such, it can be considered a “data binding” from RDF to Python.

For an example of its use, see spartaTest.py and its output.

New in this Version

This version is based on rdflib, which makes it much easier to install and simpler to use. A side effect of this is that you can instantiate a ThingFactory with a generic rdflib Store, and still access the state of the Store through its usual methods.

Preliminary support for automatic typing has also been added; if a property has an rdf:type property, it will convert it to and from the appropriate Python type. This only works for a subset of the datatypes in XML Schema types so far. See spartaTest.py for an example of this.

As before, this software is experimental; it is not complete, has not been tested well, and may contain bugs. The API may (and probably will) change in the future.

The Back Story

I’ve updated this software for two reasons; to support the API style that I prefer, and to prove a point about RDF vs. XML vis a vis data binding.

The original version of Sparta, in 2001, was intended to be a demonstration of a simple API for RDF. Aaron Swartz picked up the ball and came up with TRAMP, an alternate style for doing this, and then xmltramp, which takes a similar approach to XML.

I liked TRAMP at first, because it treated the properties of a node as a dictionary, which seemed like goodness because a datatype has a more constrained and well-known interface than an arbitrary object.

However, after following this path to its logical conclusion with my HTTP header parsing library (still a work under progress), I’ve become convinced that a) an object’s attributes are also a dictionary (at least in Python), and b) less line noise is inherently more developer-friendly. I expect that Aaron has thoughts on this as well, but I’d characterise any difference between us as merely stylistic.

xmltramp is a different beast; because the underlying data model is an Infoset, it is much more complex, and I don’t think that any such API is a really good way to work with data, unless you’re forced to by an existing format that doesn’t have a data model above the Infoset (see recent discussion regarding this).

I point this out not to trash xmltramp (it’s very good to have, if you need that sort of thing), but to show that if your underlying data model is based upon the Infoset, it’s difficult to keep all of the information therein and still provide an intuitive interface. If the data model is simple — e.g., RDF — it’s very easy and intuitive to work with it, and to retain all of the relevant information. Things like typing and validation during data binding become absolutely dirt simple.


4 Comments

James Tauber said:

Very cool. I’ve written up an entry on how this could be used for Naked Objects at http://jtauber.com/blog/2004/05/15/naked_objects_in_sparta

Saturday, May 15 2004 at 6:32 AM

Ken MacLeod said:

One thing to note is that using “prefix[underscore]localname” syntax means that prefix-mappings are scoped to the storage object and not the caller. This means that if a storage object is passed to modules authored by different developers, each module will have to register its prefix-mappings on the storage object and be prepared for the (hopefully rare) case where a prefix-conflict occurs (the same prefix used with two different namespaces; it’s ok to have two different prefixes used with the same namespace, of course).

xmltramp, by comparison, uses a Namespace feature to create a local-variable scoped namespace object which will translate Namespace.localname to its corresponding internal representation. If you pass a storage object to another module there’s never any conflict because, in effect, the prefix-namespace mapping is occuring lexically within the caller. xmltramp’s Namespace feature can’t be used as attribute names, unfortunately.

The best solution would seem to be to create a Python PEP for a new lexically-scoped syntax for supporting namespaces as attribute names and stand-alone literal values (as used for dictionary keys and parameters, for example).

Wednesday, June 2 2004 at 6:40 AM

Laurian said:

It would be cool to feed a schema, and be allowed only to create instances according with it

Tuesday, July 13 2004 at 8:14 AM