[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RSS thoughts and questions



I'm in the process of writing a RSS aggregator, and as a result have a few
comments and questions about RSS in general, and RSS 1.0 in particular
(although I believe that most of the RSS 1.0-specific questions might
illuminate greater ground than just that spec).

I haven't followed RSS 1.0 development or discussion closely, so I apologise
if some of the questions pointed at RSS-DEV have already been answered.

Without further ado;

* The Semantic of an RSS feed  
  What does an RSS feed's model represent? The assumed default seems to be
  'the chronologically last n items in the channel'.  This leads to one
  considering a feed as a moving window, or a queue; one can combine the
  most recent model with previously retrieved models, generating an
  eternally growing model. Although this is useful, it isn't always true;
  some channels are complete (i.e., if it isn't in the representation of the
  feed that you get, it isn't there).
  
  This isn't an issue specific to RDF RSS; IMHO it would be nice to
  explicitly tag the semantic of feeds to allow aggregators to be 'smarter'
  about what they do with the information in them. My aggregator, for
  instance, adds the channel items to persistent storage as they come in,
  enabling users to browse the history of the channel, as well as mark parts
  of it 'read'. While most feeds are condusive to this, some are not, and it
  would be nice to be able to detect these. 
  
  For example, I can imagine a feed that's returned by a search engine like
  Google. If I only want to see new items in the feed, I need to know that
  it's not chronological, and every invocation of the feed represents the
  entire universe of the feed). I'm sure there are other ways to
  characterise a feed that might be useful in similar ways.
  

* The context of an RSS 1.0 RDF Model
  RSS 1.0 uses RDF to leverage its model and available tools. Indeed, it's
  very easy to read an RSS 1.0 channel into an RDF parser and get an RDF
  model out the other side, which one can then query statements and generate
  representations from.

  However, the model in an RSS 1.0 feed is specific to that channel; the
  statements cannot be used in a global context. For example, an item in the
  feed might produce a statement:
    ("http://www.mnot.net/news1.html";, "Title", "a test")     
  This statement is assumed to be in the context of the channel; if it is
  added to a global model, it loses that context, and may conflict with
  statments added by other channels. In other words, I can't use a single
  RDF model/database to store all statements made by channels I'm interested
  in, because two channels might make statements about a particular
  resource, resulting in corruption.
  
  This seems to be a fundamental limitation of RSS 1.0; it requires me to
  store channels in separate models, thereby losing the benefits of
  cross-searching channels, etc. One way to address it would be to reify all
  of the statements in the channels, but this would produce a serialisation
  that would be pretty painful for many people to deal with, IMHO.

  Has this issue been thought about?


* RSS's Processing Model
  Currently, RSS uses entities to 'hide' elements indended for markup during
  presentation; for example, HTML <b> elements would be encoded as:
    &lt;b&gt;this is bold&lt;/b&gt;
  
  To me, this seems needlessly tortured, and potentially limiting, as an
  intermediate processor needs to correctly re-encode the markup. 
  
  I'd prefer:
  - if namespaces are not in use, non-RSS inline tags (i.e., those between
    the rss tags; *not* HTML inline elements) should be ignored and
    passed through (or discarded, if the processor chooses). e.g.,
      <item>
        <title>Foo!</title>
        <link>http://www.foo.com/</link>
        <description>this is the <i>foo</i> item.</description>
      </item>
  - if namespaces are in use, a default namespace should identify HTML (or
    other inline markup), and rss namespaces should be explicitly prefixed.
    e.g,
      <rss:item rdf:about="http://www.foo.com/";
       xmlns="http://www.w3.org/TR/REC-html4";>
        <rss:title>Foo!</rss:title>
        <rss:link>http://www.foo.com/</rss:link>
        <rss:description>this is the <i>foo</i> item.</rss:description>
      </rss:item>
    With this approach, one could then identify (perhaps in the channel
    header?) the namespaces that should be passed through (i.e., they
    contain information for the presentation of the channel, not the rss
    parser).

  I know this has been discussed a fair bit in the past. Comments? If a
  generic XML processing model were defined, it would be interesting to see
  how it would affect RSS.


* A few thoughts/nits about the RSS 1.0 spec - 

  - There are a number of suggestions and limitations in the spec that seem
    to be designed to promote interoperability with RSS 0.9 (e.g., the
    limits on URI schemes in link elements). Perhaps it would be good to
    move these into an appendix, so that implementations that choose to be
    RSS 0.9-compatible (i.e., those that are used for 'traditional' channels)
    can interoperate, whilst those that are using it for other purposes
    aren't constrained.

  - encoding section: "HTTP's default header encoding" -> "HTTP's default encoding" 

  - it would be interesting to see RSS 1.0 use XML Infoset to describe
    syntax.


-- 
Mark Nottingham
http://www.mnot.net/