[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Doubly-indirected story flow



> Let's say "originator.com" writes a story and posts it on their website,
> then it gets snagged via scraping, say by moreover. When I ask moreover for
> stories in RSS format, I get an (indirect) pointer to the original story,
> but there is no indication of its original source.

Just to be fair, Moreover does include the original source, though only as
some text in the description tag.

That said, this is a problem which has already been faced in the educational
community. You get an article, which, say is indexed, abstracted, reviewed,
or whatever, in some other journal. Replies, citations, and the like pile up
on each other, until you've lost the original source.

It is not a good idea to try to place all of this information in a single RDF
page. Imagine what would happen after ten cycles through the loop!

In an ideal world, each original article has its own metadata, say a
Dublin Core or similar set of descriptors, either contained in the original
article (perhaps as metadata in an HTML page, or perhaps as a separate
XML page.

Aggregate sites could (should) then point to this original metadata using
ab 'about' tag:

<item about="originalxmlfile.xml">
   <description> Which may be content produced by the aggregator</description>
   <link>which links to the actual article text</link>
   <title>which is the title given by the aggregator</title>
</item>

In other words, the information contained in the aggregator's RSS should be
content produced by the aggregator, either as a result of an RSS read, or as
a result of a human editor or reviewer. Any other information in the aggregator's

RSS file should be aggregator-related content, not article-related content.
Leave article-related content (such as author, publisher, website, etc) to
the original publisher.

IMHO.

--
Stephen Downes - Information Architect - University of Alberta
stephen.downes@ualberta.ca  http://www.atl.ualberta.ca/downes