Monday, 29 December 2003
Comment Spam and Google
Hyperlinks have been disallowed in comment bodies on this blog for a while now, and I've just removed the link associated with comment authors as well.
This is based on the assumption that the lion's share of comment spam is coming from people who want a link to their site for the benefit of their Google PageRank.
That's fine for the time being, but it seems a bit drastic; links are the primary reason that the Web works; Google is just one (clever) use of them. Not allowing links hobbles it substantially, and in the long term reduces the effectiveness of the Web.
I've always thought that Google has a unique opportunity to introduce de facto metadata standards for Web pages; let's face it, if they say "put this tag on your page to get richer Google results," it's an overnight, industry-wide standard. The problem that they face is to stop people from lying; for example, if they introduce a "category" tag this afternoon, chances are that by tomorrow, every porn, pill and nutter site out there will self-categorise themselves as being related to Saddam Hussein, Survivor and whatever else people are looking for these days.
However, there is an easy win here, and it would help us get rid of comment spam. Google could define a new HTML attribute that, when on links, would indicate that it shouldn't be perceived as an endorsement of whatever is at the other end of the URI. For example;
<a href="http://porn.example.com/" authoritative="0">Joe Commenter</a>
The authoritative attribute could be applied to all links, and would indicate whether or not the link was made on behalf of the authority (i.e., the "www.example.com" part of your URI). In this manner, Google (and others) would know not to infer anything from that link.
That way, links could be allowed in blog comments, bulletin boards and elsewhere without fear of people spamming for better PageRanks. It would also allow linking to interesting things without lending your PageRank "weight" to them, if you so choose.
How about it? Any Google metadata-heads out there? I think the three things that need to happen to make this sort of thing work are 1) support from Google 2) support from tools (e.g., appropriate HTML rewriting in blog engines) and 3) The comment spammers to realise that they've lost.