[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RSS: Introducing Myself
Hi. My name is Dan Libby.
Disclaimer 1 : My apologies if this sounds pompous, or you disagree with
my recommendations. Feel free to take it with a grain of salt.
Disclaimer 2: I speak only for myself, and not for my employer(s) - past or present.
I was the primary author of the RSS 0.9 and 0.91 spec and the architect behind
the My Netscape Network (a separate project from My Netscape, which I also
worked on). I left Netscape in 1999, in part because of what I felt was
mis-handling (non-handling?) of RSS and the MN platform. I fully expected
the format to die an ignominious death, and I was pleasantly surprised to
recently to poke my head out of the sand and find so many people still using
it. I am glad that the net community has begun adopting RSS, and would
like to see it realize the original vision. So I have been watching the
recent discussions with interest. It is my hope that some background will
at least make my original intent and reasoning clear and perhaps even help
us avoid a fork, though perhaps that is inevitable?
The original My Netscape Network Vision:
We would create a platform and an RDF vocabulary for syndicating metadata
about websites and aggregating them on My Netscape and ultimately in the
web browser. Because we only retrieved metadata, the website authors would
still receive user's click-throughs to view the full site, thus benefitting
both the aggregator and the publisher. My Netscape would run an RDF database
that stored all the content. Preferences akin to mail filters, would allow
the user to filter only the data in which they are interested onto the page,
from the entire pool of data. For example, a user interested in articles
about "Football" would be able to setup a personalized channel that simply
consisted of a filter for Football, or even for a particular team or player.
Or for all references to Slashdot.org, or whatever. This fit our personalization
scheme well, and would (I hoped) give us the largest selection of content,
with the greatest degree of personalization available. Tools would be made
available to simplify the process of creating these files, and to validate
them, and life would be good.
What Actually Happened:
1) A decision was made that for the first implementation, we did not actually
need a "real" RDF database, which did not even really exist at the time.
Instead we could put the data in our existing store, and instead display
data, one "channel" at a time. This made publishers happier anyway, because
they would get their own window and logo. We could always do the "full"
implementation later.
2) The original RDF/RSS spec was deemed "too complex" for the "average user".
The RDF data model itself is complex to the uninitiated, and thus the placement
of certain XML elements representing arc types seemed redundant and arbitrary
to some. Support for XML namespaces was basically non-existent. My (poor)
solution was to create a simpler format, RSS 0.9, that was technically valid
RDF, but dropped namespaces and created a non-connected graph. We decided
that it could always be "transformed" into a graph for the to-be-built RDF
database, but this imposed a 1 channel per file limitation. People were
willing to live with it. (note: The "inChannel" tag in RSS 1.0 proposal
solves this problem neatly). This marked the beginning of the Full Functionality
vs Keep It Simple Stupid debate that continues to this day. It is interesting
to note that the _original_ spec I wrote is actually much closer to RSS
1.0 than to either 0.9 or 0.91. At the time, I insisted that we publish
it, if only to make the RDF crowd happy, and we ironically called it the
Futures Document.
3) We shipped the first implementation, sans tools. Basically, there was
a spec for RSS 0.9, some samples, and a web-based validation tool. No further
support was given for a while, and I was kept busy working on other projects.
Even still, channels started coming in, and the system worked in a rudimentary
fashion.
4) At some point, it was decided that we needed to rev the RSS spec to allow
things like per item descriptions, i18n support, ratings, and image widths
and height. Due to artificial (in my view) time constraints, it was again
decided to continue with the current storage solution, and I realized that
we were *never* going to get around to the rest of the project as originally
conceived. At the time, the primary users of RSS (Dave Winer the most vocal
among them) were asking why it needed to be so complex and why it didn't
have support for various features, eg update frequencies. We really had
no good answer, given that we weren't using RDF for any useful purpose.
Further, because RDF can be expressed in XML in multiple ways, I was uncomfortable
publishing a DTD for RSS 0.9, since the DTD would claim that technically
valid RDF/RSS data conforming to the RDF graph model was not valid RSS.
Anyway, it didn't feel "clean". The compromise was to produce RSS 0.91,
which could be validated with any validating XML parser, and which incorporated
much of userland's vocabulary, thus removing most (I think) of Dave's major
objections. I felt slightly bad about this, but given actual usage at the
time, I felt it better suited the needs of its users: simplicity, correctness,
and a larger vocabulary, without RDF baggage. (I also had a really fun
time writing a vocab independent XML validation system in python, which
it turns out is pretty similar to XML-Schema.)
5) We shipped the thing in a very short time, meeting the time constraints,
then spent a month or two fixing it all. :-) It was apparently not deemed
"strategic", and thus was never given more than maintenance attention.
6) People on the net began creating all sorts of tools on their own, and
publishing how-to articles, and all sorts of things, and using it in ways
not envisioned by, err, some. And now we are here, debating it all over
again. Fortunately, this time it is in an open forum.
My Perspective On "The Right Thing":
1) I agree with Dave and others that ease/simplicity of USE is very important.
I think the success of RSS 0.9* has been because it was so simple. Anyone
who knew HTML could do it, which was good, because they had to do it by
hand.
2) Simplicity and ease of use do not require a simple format. Microsoft
Word is pretty simple to use, but try reading the binary representation
of a saved file sometime. Or even their new XML representation. The important
thing is that the end-user tools be simple. This means pre-built scripts
for script-writers and field-by-field hand-holding entry for those who would
otherwise hand-code, and a validator for both.
3) Flexibility and extensibility are necessities and supercede even the need
for simplicity. Without them, the format will assuredly split and will
be used in ways never intended. With them, it is safe to add your own random
data type, and the receiver is free to interpret or ignore as it sees fit.
As long as everyone agrees on the core, RSS remains a useful mechanism.
For this reason, I would suggest that dublin core be added to the core spec,
in the way that RSS 0.91 has been (as a core 'module'). This was originally
intended anyway, as evidenced by the "futures" document.
4) Validation is extremely important -- important enough to be listed apart
from "tools". Someone publishing a document *must* be able to validate
that the document is correct before sending it, particularly when setting
up an automated system. Validation further helps prevent the format from
splitting, particularly in areas where the spec may be unclear. For XML,
validation requires minimally a DTD, and optimally XML-Schema and/or further
application level processing. For RDF, validation requires an RDF-Schema
aware processor (I believe).
5) Given the above points, I (for the most part) support the RSS 1.0 spec,
as written. I believe it has a high degree of flexibility while maintaining
a relatively simple core set. However, to be *practical*, we must first
create the tools for 1) validation, 2) processing, and 3) generation, pretty
much in that order. With proper validation tools, people can begin writing
processors and generators, or even producing files by hand. Without them,
it is like shooting in the dark.
5a) Another note on this, and a caveat -- given that the RSS 1.0 spec utilizes
RDF, I believe that the tools and format itself should be RDF aware _from
the start_. A solid foundation is key to building anything that is going
to last. This means that it is the *data model* that is important, not the
physical syntax of "start with channel, then several items, etc". In fact,
I believe the spec itself should be an RDF Schema depicting the data model,
with simple examples of how to express it in XML. Anything less results
in confusion and a mish-mash of incompatible tools, where some are simple
XML processors and some are full RDF-aware processors. I see this as the
largest hurdle for RSS as RDF, given the comparative lack of RDF tools to
XML tools. If we are not willing to commit to this in the spec and tools,
then we may as well go back to a plain XML format. In other words, put
up or shut up.
6) Is RDF Necessary? Well, no. Not for plain syndication anyway. That's
why I got rid of it in 0.91. But it is pretty cool. Now, after a year
of working with it on a day to day basis, I have a fairly good understanding
of what it is and is not good at. It is good at expressing a data model
and allowing one to refer to arbitrary things without duplicating data,
something the XML tree structure is weak at. SInce RSS has "Summary" as
its third word (regardless of version), that seems like a pretty good match.
I think that basing the format in RDF will add value as more and more people
are using it and are able to refer to things in databases all over the web
without physically re-bundling the data. In other words, the value at the
beginning will be small or non-existent, but will grow non-linearly over
time.
7) I think that the original vision mentioned above is still do-able, particularly
given the existence of guha's RDFDB and similar tools, and that someone
could build a very kickass personalization/filtering and syndication system
that way. (Of course, given proper transformations and a suitable backend,
you could regardless of the format.)
That's my $.02. My congrats and thanks to the authors and champions of the
RSS 1.0 spec and all of you who have given RSS renewed life after Netscape
all but abandoned it, and to Rael Dornfest for making me take notice.
-danda