mnot’s blog

Design depends largely on constraints.” — Charles Eames

Friday, 12 December 2003

Notes on Atom

Filed under: Syndication

As you may know, I’m editing the Atom format draft in my copious spare time, but not actively participating in the community (I am watching, but I don’t have the time to really dig in).

I think this is healthy, because it forces me to concentrate on the quality and clarity of the specification; so many efforts come up with unreadable (and therefore unimplementable, unless you were at the table) specifications because you “had to be there” to understand what they actually meant.

So, I’ve just posted a new rev of the specification; overall, I think the quality of the technical work the Atom folks are doing is outstanding, and I’m fairly pleased with how the editorial end is shaping up.

However, as I was incorporating the most recent changes that the community agreed upon, I came across a few notes that I’d like to share. Without further ado;

The version attribute - I thought this was one of the reasons we wanted to move on from RSS; numeric versions are linear, are apt to have lots of things read into them, and carry redundant information to the namespace. See the TAG finding.

Adding extensions - There’s no way to identify whether an extension module (e.g., in an entry) is required to understand the feed; this is our one opportunity to put a mustUnderstand semantic into ATOM. Why isn’t it there? Once again, see the TAG finding (DavidO did some fantastic work on this).

The mode attribute - The list of possible values isn’t qualified; is this a completely closed list that will never change? If not, how does it get added to in the future? E.g., what if someone comes up with a really nifty encoding that they want to use for their application? I’d suggest using a URI rather than a token here.

The rel attribute - Same as the mode attribute.

The type attribute - This is probably too radical for some, but why not use a URI rather than a media type here? You can identify media types with URIs (e.g., urn:ietf:params:media-type:image/jpg), and you can also identify more ad hoc formats (e.g., business-specific ones) without the pain and uncertainty of media type registration.

@mode=”escaped” - It’s unclear what’s meant by ‘escaped’ here; is it XML? HTML? URI?

Modularity - I like that the link-related elements have been collapsed into one, with an attribute to qualify what kind of link they are. To simplify things and make it symmetric, the same should be done with the content-related elements; i.e., rather than having a title element, content element, summary element, etc. just have a content element and an attribute that says what kind of content it is, just as link does. Otherwise, link should be split up into separate elements, to be consistent with the content-related elements (it’s confusing to mix the two styles in the same format).

multipart/alternative - Having a special case for this is bad design; why can’t you just have multiple content elements with different types, and have the application choose from them? To me, this is the most glaring problem in the spec, because it’s misusing both the type attribute and the media type; the type attribute specifies the format, not the semantic, but the use here is the multipart semantic, and not the format.

Allowing multiple formats - Now that the content-related elements (e.g., copyright) allow typing and encoding, shouldn’t multiples be allowed? E.g., you could have a text/plain copyright statement and a text/html one. Also, I don’t see the need for the prohibition against machine-readable copyright statements, now that you can associate a media type with them to differentiate them.


Danny said:

Great work on the spec - I’ll comment to list asap. I’d somehow missed the TAG note on extensions - marvellous!

Re. mode and rel attributes - yep, URIs would be better (in fact I suggested this for rel on list a while ago). But there seem to be a little stubbornness to retain the simple rel=”blah”. I think the RFC 2731 (rel=”Schema.blah”) approach is a reasonable compromise - gives the mules the hard-wired token while allowing a means for extension.

Friday, December 12 2003 at 2:09 AM

Danny said:

PS. Modularity, collapsing content links - there is a precedent, e.g. :

… and for links:

Friday, December 12 2003 at 4:23 AM

Danny said:


for content:

<dc:title rdf:parseType=”Literal”>

for links:

<foaf:homePage rdf:resource=”” />

Friday, December 12 2003 at 4:25 AM

Dave Orchard said:

In the absence of an “processing model flag”, what is the default processing model for extensions? Is it Ignore, Fault, something else? This seems like a potential big interoperability problem. The TAG documents also say that a processing model should be specified for an extension.

Regardless of whether the use of just namespaces or namespace + versioning, the policy for compatibility guarantees need to be specified. Assuming the (IMO very wrong) version attribute is kept, will “1.1” feed messages work in “1.0” aggregators and will “1.0” feed messages work in “1.1” agregators?

Where is extensibility allowed in the elements? Is it an open content model?

Friday, December 12 2003 at 5:43 AM

Mark said:

It was my understanding that @mode was closed; only “xml”, “escaped”, and “base64” are allowed. This was the case in 0.2 and the Feed Validator currently enforces this.

It was also my understanding that @rel was also closed; only “alternate”, “start”, “prev”, “next”, “service.edit”, “”, and “service.feed” are allowed. This exactly mirrors Joe’s Atom API draft 09. The next version of the Feed Validator will enforce this as well.

Friday, December 12 2003 at 9:45 AM

Danny said:

Mark P. - according to the Wiki and recent posts on list and Sam’s slides, rel at least can be extended.

Joe lists “well-known values” for type application/x.atom+xml, but makes no statement about closure in the API doc (which is as it should be, this is an issue for the format spec).

So the extensible version is also consistent with the API doc.

Saturday, December 13 2003 at 5:27 AM

Mark said:

According to it would appear that best practice is to put the major version in the namespace, and the minor version in an attribute.

Also, in the language of that document, I would say that Atom is a “Must Ignore All” language – that is, if consumers come up against an element they don’t understand, they must ignore it and all its children.

Monday, December 22 2003 at 6:19 AM

Creative Commons