Notes on Atom

Friday, 12 December 2003

As you may know, I’m editing the Atom format draft in my copious spare time, but not actively participating in the community (I am watching, but I don’t have the time to really dig in).

I think this is healthy, because it forces me to concentrate on the quality and clarity of the specification; so many efforts come up with unreadable (and therefore unimplementable, unless you were at the table) specifications because you “had to be there” to understand what they actually meant.

So, I’ve just posted a new rev of the specification; overall, I think the quality of the technical work the Atom folks are doing is outstanding, and I’m fairly pleased with how the editorial end is shaping up.

However, as I was incorporating the most recent changes that the community agreed upon, I came across a few notes that I’d like to share. Without further ado;

The version attribute - I thought this was one of the reasons we wanted to move on from RSS; numeric versions are linear, are apt to have lots of things read into them, and carry redundant information to the namespace. See the TAG finding.

Adding extensions - There’s no way to identify whether an extension module (e.g., in an entry) is required to understand the feed; this is our one opportunity to put a mustUnderstand semantic into ATOM. Why isn’t it there? Once again, see the TAG finding (DavidO did some fantastic work on this).

The mode attribute - The list of possible values isn’t qualified; is this a completely closed list that will never change? If not, how does it get added to in the future? E.g., what if someone comes up with a really nifty encoding that they want to use for their application? I’d suggest using a URI rather than a token here.

The rel attribute - Same as the mode attribute.

The type attribute - This is probably too radical for some, but why not use a URI rather than a media type here? You can identify media types with URIs (e.g., urn:ietf:params:media-type:image/jpg), and you can also identify more ad hoc formats (e.g., business-specific ones) without the pain and uncertainty of media type registration.

@mode=”escaped” - It’s unclear what’s meant by ‘escaped’ here; is it XML? HTML? URI?

Modularity - I like that the link-related elements have been collapsed into one, with an attribute to qualify what kind of link they are. To simplify things and make it symmetric, the same should be done with the content-related elements; i.e., rather than having a title element, content element, summary element, etc. just have a content element and an attribute that says what kind of content it is, just as link does. Otherwise, link should be split up into separate elements, to be consistent with the content-related elements (it’s confusing to mix the two styles in the same format).

multipart/alternative - Having a special case for this is bad design; why can’t you just have multiple content elements with different types, and have the application choose from them? To me, this is the most glaring problem in the spec, because it’s misusing both the type attribute and the media type; the type attribute specifies the format, not the semantic, but the use here is the multipart semantic, and not the format.

Allowing multiple formats - Now that the content-related elements (e.g., copyright) allow typing and encoding, shouldn’t multiples be allowed? E.g., you could have a text/plain copyright statement and a text/html one. Also, I don’t see the need for the prohibition against machine-readable copyright statements, now that you can associate a media type with them to differentiate them.

Mark Nottingham

Notes on Atom