[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [syndication] shared feed lists

To: syndication@yahoogroups.com
Subject: Re: [syndication] shared feed lists
From: Rick Bradley <roundeye@roundeye.net>
Date: Tue, 14 Oct 2003 10:49:47 -0500
In-reply-to: <187d01c39259$fe8ec560$6401a8c0@murphy3>
References: <159101c391d7$7efe0a40$6401a8c0@murphy3> <3F8B6561.1000409@bitworking.org> <177a01c3924c$4500dc80$6401a8c0@murphy3> <3F8BF649.3080801@bitworking.org> <182301c39256$09fc7a90$6401a8c0@murphy3> <02ac01c39259$0b4a7700$200ca8c0@wkearney.com> <187d01c39259$fe8ec560$6401a8c0@murphy3>
User-agent: Mutt/1.5.4i

* Dave Winer (dave@userland.com) [031014 08:54]:
> First let's take out the emotionally charged words, blindly, waste, clog up,
> etc.
> 
> Do the math. I answered this question in the Q&A. I don't know how to answer
> it again without just repeating the answer.
> 
> But let's try anyway. ;->
> 
> Assume you look for a link to the directory file in the HTML of the home
> page of the site.
> 
> To find the directory, you:
> 
> 1. Read the index file.
> 
> 2. Look for the link element.
> 
> 3. Read the directory file it points to.
> 
> In the approach I'm advocating you:
> 
> 1. Read the directory file.
> 
> Now please explain why is the first approach more efficient.


Let me pose a question which may help illuminate why I think some people
(including myself) find your proposal far less than satisfactory:

How did you, the searcher for feeds, reach the site in question?  

I suggest that there are 3 possible answers to this question:

    (a) you traversed a link from a document written in some parseable
    structured format 
    
    (b) you're just sweeping through IP addresses trying to find what's
    on the other side

    (c) you were given an URL out-of-band (say in an email, over the
    phone, or otherwise), i.e., not derived from parsing a structured
    document.


I'm going to posit first that case (b) is not one which should be
driving our concerns here.  Most users are not sweeping IP space for
info, and to those who are:  tough.  It should continue to be our goal
to architect information systems from which we can derive meaning, not
to build for the mindless slurping of data by those refusing to use the
powerful tools already widely deployed.  By stepping outside the realm
of structured protocols and dialects they are making a choice against
extracting powerful meaning from elaborate structure, and there's no
pragmatic reason to consider their further ease of use.


Then I'm going to posit that case (c) will be in the minority of uses,
and has its own use cases.  It should be apparent that most web accesses
for the foreseeable future are going to be from case (a), due to the
relative difficulty in collecting addresses for case (c) document
retrieval.  The difference in use case however is this:  when accessing
a document from a URL obtained out-of-band it is more than reasonable to
expect that the accessor has no idea what documents may lie at that
site, nor what contents they may contain.  The accessor must retrieve at
least one document, possibly many, in order to determine what useful
information may lie on the other side.  Some possibilities include:

    - / 
    - robots.txt
    - favicon.ico
    - index.html
    - default.opml

Of these 4 the only document which can reasonably be expected to exist
in most cases is '/'.  For the other documents there are questions and
assumptions built-in:

 - Assumption:  This site is likely to provide the sort of content I'm
   after.

 - Question:  How should the URL be modified to maximize the likelihood
   of finding this document?

 - Question/Assumption:  Not finding the document at this URL means it's
   not there(?), or perhaps it's likely I can munge the URL and find it
   elsewhere.

 - Assumption:  Not finding this document means that {site doesn't
   support foo, some default policy prevails, etc.}.

 - Assumption:  This URL represents a single "site" with some coherent
   policy enforced/enabled by its owner.

In short, when dealing with case (c) usage, the accessor of documents
doesn't know much and:  

  (1) is a robot blindly stumbling around scooping up everything in its
  path, making sure that it doesn't chew up bandwidth unnecessarily
  (think of this as the "robots.txt grandfather clause")

  (2) will access '/' and look for structured information which will
  tell the accessor where it's likely to find specific resources related
  to the specific URL it accessed

  (3) will stumble around in the dark rudely poking at likely non-existent
  documents that may have only passing relevance to the resources at the
  original URL.

Now, why a non-stumbling-robot would argue that we should be designing
for behavior (3) instead of behavior (2) is presumably a matter for the
neurologists, which is not my area of specialization.


Going back to our original 3 cases (a, b, and c); the remaining case of
concern, "b", comprises the bulk of non-stumbling-robot document
accesses.  Accessors are coming in via a link from a structured
document.  The only non-moronic thing to do is to load the document at
the end of the URL you pulled from the parsed document.  If you find
resources in there that are of interest, then they are almost certainly
highly related to the document in choice.

If instead, no such resources are found:

 - They probably don't exist, since the author of the document in
   question would be the person most expert in the field of knowing
   where such resources are, and since they didn't tell you within the
   document, it stands to reason that such doesn't exist -- or:
   
 - The author doesn't want you to know where such resources are located.
   It is possible that any such resources are unrelated to the document
   you're viewing.

 - It would be moronic, in just the same way argued above, to then begin
   drunkenly stumbling around the webserver racking up 404's to find
   something likely not to be there -- or likely not to be relevant.

Dave, you're arguing that we design a protocol for robots or stumbling
morons, when everyone else agrees that a protocol usable by even the
below average would suffice.


On an earlier Web where every site was (at least assumed to be)
monolithic, topical, and had a single owner, stuffing a file with a
known name somewhere around what a consensus agrees looks most like '/'
might have been a reasonable idea.  I was sick of that version of the
Web back in 1997, and fortunately so were a lot of other people.

I've got documents on the Web now that were created before there was a
Web to put them on (literally).  I threw out Commodore VIC-20 disks
three years ago, but I'm also bringing more and more data online and I'm
not creating a new "web site" every time I create a new archive of
information.  Barring catastrophic data loss (knock on wood) I expect to
only add to my online collection, perhaps automatically reformatting
documents occasionally.  

New document formats and standards are making it possible for us to
conceptually group documents and place things like RSS feeds at
arbitrary but useful points throughout the collection (it's my belief
that moving to syncato-like systems will expand that flexibility even
further).

Creating "lists of feeds" or "feeds of feeds" is an idea I think we can
all get behind; but if 5 years from now as I'm loading in a new set of
video archives I notice that 50% of my 404's are from tools implementing
this dumbass idea that Dave Winer pushed back in 2003, the 6amwakt [0]
is gonna role and administer a well-deserved cockpunch.

[0] http://www.rickbradley.com/tour/  
    (I'll wait for thread death before updating Dave's supporting
    documentation)

Rick
-- 
 http://www.rickbradley.com    MUPRN: 424
                       |  on the intake
   random email haiku  |  manifold. If the gasket
                       |  overhangs trim it.

References:
- RFC: myPublicFeeds.opml
  - From: "Dave Winer" <dave@userland.com>
- Re: [syndication] RFC: myPublicFeeds.opml
  - From: Joe Gregorio <joe@bitworking.org>
- Re: [syndication] RFC: myPublicFeeds.opml
  - From: "Dave Winer" <dave@userland.com>
- Re: [syndication] RFC: myPublicFeeds.opml
  - From: Joe Gregorio <joe@bitworking.org>
- Re: [syndication] RFC: myPublicFeeds.opml
  - From: "Dave Winer" <dave@userland.com>
- Re: [syndication] shared feed lists
  - From: "Bill Kearney" <ml_yahoo@ideaspace.net>
- Re: [syndication] shared feed lists
  - From: "Dave Winer" <dave@userland.com>

Prev by Date: Re: [syndication] shared feed lists
Next by Date: Re: [syndication] RFC: myPublicFeeds.opml
Previous by thread: RE: [syndication] shared feed lists
Next by thread: Re: [syndication] shared feed lists
Index(es):
- Date
- Thread