mark nottingham

XML Language Bindings Done Right

Wednesday, 23 June 2004

XML

John Schneider was in the office last week and gave me a demo of something he’s been working on for a while, E4X — by far one of the coolest technologies I’ve seen in some time. I think that every language is going to want one when they see this stuff.

In a nutshell, E4X is a native XML binding for Javascript (sorry, ECMAScript); it makes XML a first-class datatype, rather than stuffing it into an object model. John explains it much better than I could, but this deals with a large number of the problems that come up because of the complexity of the Infoset, particularly around ordering.

This approach makes it dirt simple to work with XML; no DOM (yuck!), no events, just intuitive access and manipulation. It’s so deeply ingrained, XML is a new literal type (examples courtesy of John);

var order = <order>
         <customer>
                  <firstname>John</firstname>
                  <lastname>Doe</lastname>
         </customer>
         <item>
                  <description>Big Screen Television</description>
                  <price>1299.99</price>
                  <quantity>1</quantity>
         </item>
         <item>
                  <description>DVR-5000</description>
                  <price>289.99</price>
                  <quantity>3</quantity>
         </item>

</order>

Then, you can access elements’ contents just as if they were properties:

var name = order.customer.firstname + " " + order.customer.lastname;

including access with respect to order:

print("The second item is:\n" + order.item[1]);

You can add new children just by assigning them:

var x = <x/>;
x.a = "one";
x.b = "two";

or append content like this:

order.item += 
<item><description>Catapult</description><price>139.95</price></item>;

If you want to dynamically construct some content, just use curly braces:

var tagname = "name";
var attributename = "id";
var attributevalue = 5;
var content = "Fred";
var x = <{tagname} {attributename}={attributevalue}>{content}</{tagname}>;

And, critically, both namespaces and attributes are very intuitive yet syntactically distinct:

var soap = new Namespace("http://schemas.xmlsoap.org/soap/envelope/");
var encodingStyle = message.@soap::encodingStyle;

This makes working with XML completely natural:

var totalprice = 0;
for each (i in order.item) {
	totalprice += i.price * i.quantity;
}
print("The total price of the order is: " + totalprice);

Wow — very cool stuff, and reputedly done Real Soon Now. When can we get this in Python?

UPDATE: I’m getting a number of e-mails and comments suggesting Python XML-to-Object bindings. That’s fine, but they don’t generally allow access to all of the information in the Infoset, or if they do, they require pretty torturous syntax. See the example above re: attributes and namespaces.


9 Comments

Karl Dubost said:

Python…. Something ala XMLObject?

Or did I miss something?

http://xmlobject.base-art.net/doc.html

Wednesday, June 23 2004 at 8:06 AM

Stub said:

Might be fun for toy applications that are all ASCII, or should I say subset of ASCII that doesn’t clash with reserved words or characters.

foo =

Hmm…

Thursday, June 24 2004 at 1:48 AM

Stub said:

Might be fun for toy applications that are all ASCII, or should I say subset of ASCII that doesn’t clash with reserved words or characters.

foo = <xsl:apply-templates match=”foo” />

Hmm…

Thursday, June 24 2004 at 1:48 AM

Jeff Bone said:

Similar interest: http://research.microsoft.com/~emeijer/Papers/XS.pdf

IMHO, Meijer’s a hellishly smart guy. I think there’s some deep theoretical stuff here; he doesn’t come out and draw the parallel between his typed streams and e.g. pipes, pi-calculus, etc. but it’s lurking there. Given his involvement in functional languages and possible influence on MSH / Monad, there’s some real innovation going on there that makes me worry that UNIX is about to lose the high ground in a couple of key areas.

FYI, too: Rebol has a similar integration w/ markup-based datatypes; tagged structures (documents) are a datatype that’s directly represented. I.e., markup is literal syntax (as above.) It’s also agnostic as to the actual markup language; it just recognizes tagged structures syntactically, the interpretation of the structure is left to code.

Thursday, June 24 2004 at 2:13 AM

Nelson Minar said:

Very cool, thanks for posting this! How well does it handle attributes and namespaces? What does it do with entities?

Python has something a bit like this - xmltramp. http://www.aaronsw.com/2002/xmltramp/

Thursday, June 24 2004 at 9:06 AM

Robert Sayre said:

Check out Stan from the Nevow package by Donovan Preston (Quotient/Twisted).

http://nevow.com

Stan and list comps make a pretty killer combination.

Thursday, June 24 2004 at 10:03 AM

Karl Dubost said:

I had forgotten this one too.

http://effbot.org/zone/element-index.htm

I have looked at the pages of EX4… and I’m still asking myself… yes, good, and?

Saturday, June 26 2004 at 12:58 PM

Doug Landauer said:

Also check out Scala at http://scala.epfl.ch/ , especially http://scala.epfl.ch/intro/xml.html and http://scala.epfl.ch/intro/regexppat.html .

Monday, June 28 2004 at 3:24 AM

Hendy Irawan said:

For all of you PHP users something like this is available in PHP 5’s SimpleXML extension (which comes preinstalled by default).

It makes things easy, and no DOM :-)

Wednesday, March 2 2005 at 10:23 AM