Counting the ways that rev="canonical" hurts the Web

Tuesday, 14 April 2009

I had a lovely holiday weekend in Canberra with the family, without Web access. Perhaps I’ll blog about that soon — Canberra being in my opinion one of the nicest overlooked cities in the world — but that will have to wait. Going offline for a few days always brings a certain dread of what one’s inbox will hold when you get back, and this one was no exception.

That’s because while I was watching the kids rolling down the grass slope on top of Parliament House, rev="canonical" started to gain some serious momentum, billing itself as a way to shorten URLs that “doesn’t hurt the Internet.” In my opinion, this is an interesting idea with an very unfortunate execution that’s bad for the Web, and I’m going to enumerate the reasons here.

1. Misapplied Trust

If a resource with URL A has a rev="canonical" link to URL B, A is essentially saying that it’s the canonical URL for B. In other words, anybody who uses that information is trusting A to make assertions on behalf of B. A naive consumer of these links will allow A to put words in B’s mouth no matter what their real relationship is; http://evil.attacker.org/ can say that it’s the canonical link for http://innocent.bystander.com/.

Or, more subtly, http://example.edu/~user1/ can say that they’re the canonical link for http://example.edu/~user2/. The important thing to note here is that A isn’t asserting what it’s relationship to B is; it’s asserting what B’s relationship to A is — which it may or may not have the right to do.

An easy answer to this is that “we only are using canonical to mean that it’s a short link” — but the point is that the canonical link relation already has a de facto meaning, and it’s not being used for that purpose. Reusing canonical for this purpose only dilutes its semantics, reducing its value.

2. Rev is a Trap

#1 scratches at the surface of a much deeper problem — that the rev mechanism is very powerful and very tricky, because while it doesn’t change the semantics of a link relation, it does change the relationships between the parties, with many consequences that aren’t obvious. Compounding this confusion is the single-letter difference between rev and rel; people often use them interchangeably.

99% of the time, rev gets people into trouble, and this is both the reason that it never really took off, and that both HTML5 and my Link draft have deprecated it. Using rel and a separate relation is much clearer and much less prone to misinterpretation.

3. Unilateral Action

Finally rev="canonical" has been launched as a Web site, a blog, and a Slashdot article, but AFAICT zero discussion within the communities that care about this; HTML5, HTTPbis and without coordination with the people who defined* the canonical link relation.

Launching a new library, service or Open Source project with these sorts of Web 2.0 marketing techniques is pretty much business as usual these days, so it’s understandable that the same techniques have been used here.

However, it’s important to understand that protocol and markup elements aren’t a standalone project — they’re very much the shared commons that keep us communicating with each other, instead of past each other. By unilaterally repurposing the semantics of an existing element, the already shaky agreement that our computers have when talking to each other just got shakier, with another special case.

Some Suggestions (in both directions)

OK, enough pointing out what’s wrong. The idea of rev=”canonical” is a good one; the only thing that really needs to change is the syntax. Something as simple as rel="shorturl" should do the trick — i.e., allowing URL A to assert that it’s also available through URL B, which is shorter than A.

It does appear that some people have made that suggestion, but because the discussion has been spread across Twitter, at least one Google Group and countless blogs, it’s impossible to tell what the real state of things is. I’ve seen at least one example of someone not agreeing with the rev="canonical" approach, and as a result starting a new group to discuss an alternative, to “come to consensus.” The problem, of course, is that that’s the consensus of a very highly self-selective group, and not representative of a wider community. This is where reusing established infrastructure such as the IETF APPS-discuss list or the W3C www-talk list would come in handy.

To be fair, the means of extending the Web in this fashion aren’t readily apparent to those that aren’t part of the process, so it’s not surprising that they just went and tried to do it. We’re trying to fix this somewhat for links in the link draft, but I’m sure it could do a better job. Any suggestions are welcome on either to me directly, or on the HTTPbis list.

Stepping back, I think this sort of thing is going to happen more often, not less. Microsoft and Netscape unilaterally extended the Web with MARQUEE and BLINK, and it was ugly, but the impact wasn’t nearly as bad as countless Web developers all extending the Web in their own way could be. The onus is clearly upon organisations like the W3C and IETF to make themselves as transparent and approachable to developers as possible, so that the latent experience and expertise in them can be drawn upon by these innovators, instead of being seen as either irrelevant or impediments.

* disclaimer: I work for one of them, but have nothing to do with that department; I found out about canonical after they announced it).

Mark Nottingham

other Internet and Web posts