mnot’s blog

Design depends largely on constraints.” — Charles Eames

Monday, 29 December 2003

Comment Spam and Google

Hyperlinks have been disallowed in comment bodies on this blog for a while now, and I've just removed the link associated with comment authors as well.

This is based on the assumption that the lion's share of comment spam is coming from people who want a link to their site for the benefit of their Google PageRank.

That's fine for the time being, but it seems a bit drastic; links are the primary reason that the Web works; Google is just one (clever) use of them. Not allowing links hobbles it substantially, and in the long term reduces the effectiveness of the Web.

I've always thought that Google has a unique opportunity to introduce de facto metadata standards for Web pages; let's face it, if they say "put this tag on your page to get richer Google results," it's an overnight, industry-wide standard. The problem that they face is to stop people from lying; for example, if they introduce a "category" tag this afternoon, chances are that by tomorrow, every porn, pill and nutter site out there will self-categorise themselves as being related to Saddam Hussein, Survivor and whatever else people are looking for these days.

However, there is an easy win here, and it would help us get rid of comment spam. Google could define a new HTML attribute that, when on links, would indicate that it shouldn't be perceived as an endorsement of whatever is at the other end of the URI. For example;

<a href="http://porn.example.com/" authoritative="0">Joe Commenter</a>

The authoritative attribute could be applied to all links, and would indicate whether or not the link was made on behalf of the authority (i.e., the "www.example.com" part of your URI). In this manner, Google (and others) would know not to infer anything from that link.

That way, links could be allowed in blog comments, bulletin boards and elsewhere without fear of people spamming for better PageRanks. It would also allow linking to interesting things without lending your PageRank "weight" to them, if you so choose.

How about it? Any Google metadata-heads out there? I think the three things that need to happen to make this sort of thing work are 1) support from Google 2) support from tools (e.g., appropriate HTML rewriting in blog engines) and 3) The comment spammers to realise that they've lost.


Filed under:

11 Comments

Stefan Tilkov said:

Since you're using MovableType, have you considered MT-Blacklist ( http://www.jayallen.org/projects/mt-blacklist/ )?

I really like your suggestion, though, since MT-BlackList (and similar solutions) will work only if you don't have too many comments. It works extremely well for me, though.

Monday, December 29 2003 at 12:30 PM +10:00

Dave Seidel said:

Nice suggestion, very elegant, especially since the use of an attribute is entirely under the spamee's control.

Monday, December 29 2003 at 1:11 PM +10:00

Sérgio Nunes said:

Why don't you use a "gateway" to redirect all links in your comments?

Example:

www.porn.com would be changed to www.mnot.net/redirect?l=www.porn.com

This way "porn.com" doesn't gets references from your site.

Monday, December 29 2003 at 1:17 PM +10:00

Mark Nottingham said:

Regarding a gateway: That won't work if Google follows redirects (HTTP and/or HTML) and still considers it an authoritative link. Anybody know whether they do?

BTW, such an attribute would have to be namespace-qualified in XHTML, I think.

Monday, December 29 2003 at 1:24 PM +10:00

Mark Baker said:

Nice. How about rel="nonauth"?

Monday, December 29 2003 at 4:43 PM +10:00

Karl Dubost said:

Rel is here for that... but we will have to define first the format of profiles at W3C.

Because it's not yet universally defined.

See:

http://www.w3.org/TR/1999/REC-html401-19991224/struct/links.html#adef-rel

http://www.w3.org/TR/1999/REC-html401-19991224/types.html#type-links

http://www.w3.org/TR/1999/REC-html401-19991224/struct/global.html#profiles

An attempt to create a format has been made by Tantek but in a poorly designed way. Conflict of vocab, etc. There's a need for a wider consensus on this topic.

http://gmpg.org/xmdp/

Part of the issues with profiles are given there:

French - http://www.la-grange.net/2003/12/17#xfn

Follow automatic translation to have it in english.

I think to write in details the issues about profiles to explain what are the problems.

Tuesday, December 30 2003 at 11:44 AM +10:00

Mark Nottingham said:

It looks like they're using automated agents to submit comments, because I'm still getting some, even after making it plain that links won't be allowed. To fix it, I've patched MT/App/Comments.pm:

57a58,61
> if ($q->param('url')) {
> return $app->handle_error($app->translate(
> "URLs are not allowed in entries."));
> }

Tuesday, December 30 2003 at 12:08 PM +10:00

Mark Nottingham said:

Just a thought — what if Google (and others) just made it a practice to ignore links that were descendants of the blockquote element?

Saturday, January 3 2004 at 1:38 PM +10:00

Hans Gerwitz said:

Context would be a good way to indicate the nature of links, maybe even preferable to explicit relation attributes. Blockquote is used too often for layout purposes, though; I'm not sure [X]HTML has any suitable context.

Sunday, January 4 2004 at 7:37 PM +10:00

Luke Francl said:

Since no existent links use this attribute, wouldn't this totally screw Google's index? It's not like most people would update billions of links in any timely fashion, if ever.

Tuesday, March 9 2004 at 2:24 PM +10:00

Mark Nottingham said:

The attribute’s presence would block indexing, not its absence.

Tuesday, March 9 2004 at 3:59 PM +10:00

Creative Commons