[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [syndication] RFC: myPublicFeeds.opml

To: <syndication@yahoogroups.com>
Subject: Re: [syndication] RFC: myPublicFeeds.opml
From: "Bill Kearney" <ml_yahoo@ideaspace.net>
Date: Tue, 14 Oct 2003 22:09:50 -0400
References: <350411BE3B7FF4488CA7797C9F1E90A69E0109@exchange.ibmi.informedbeverage.com>
> Point: Bandwidth costs money.

Up to point this is a valid argument.  There /continue/ to be RSS reader
programs (ahem... radio...) that cannot even use Gzip compression.   And others
that also didn't get on the support of eTags and other HTTP header elements
until /very/ late in the game.  So even the most basic of HTTP standard tools
for managing this remain poorly implemented.  The bandwidth argument is a quite
the can of worms.

> Assumption: If you don't support a feature, then a 404 error will cost you
> more in bandwidth than if the feature is implemented in a different way.

This is also a point where it raises concern of the people that operate the
website as a whole.  They're often not directly connected to the people making
the content.  I've gotten more than a few e-mails from website admins asking
"what the hell is this RSS and why are these programs hammering my server every
10 minutes".  Why oh why would we want to start promoting an idea that would
most certainly make that situation WORSE?

> If the file exists at the given location, and only at the given location,
> then by not supporting a feature with that name, you're paying the cost of a
> 404 error.  Admittedly, these could add up if you get a whole bunch of 'em.
>
> But if you instead choose not to support a feature and that feature is
> implemented through a link tag, then it would seem that you'd pay even more
> in bandwidth costs.  Someone looking through your site for feeds would need
> to pull the entire index page in order to parse your link tags and determine
> that you don't have what they need.

Sure, but you're serving the pages already.  Now we could split hairs here and
question what amount that head data adds to the bandwidth freight.  But we'd end
up running down some rather absurd degrees of argument.   Providing useful data
will most certainly incur a cost.  The question is will the practice being
espoused be better or worse?

One reason I'm opposed to using static URLs and favoring link tags is it will
actually require things wanting to make use of programmatically consumable data
to actually programmatically go find it.  Shocking idea, that.

> Comparison here is the size of a 404
> error vs the size of a typical index.  I think that even if you have a fancy
> custom 404 error page, chances are high that your index will be much larger,
> and thus result in much higher bandwidth costs.  If you have to check both
> an index and a file location, then obviously your costs would be higher
> still.

Oh certainly, I've seen sites site doing custom error pages and not returning
valid HTTP codes (more than a few) and don't support lastModified and eTag
headers (more still).  The situation could get ridiculous.  Is that an argument
for or against blind URLs?  I'm not sure, it's more an issue about basic website
managment.  It's certainly an important point though.

> If someone is already on your site and requests those feeds, then naturally
> your costs would be even lower.  But I think the way many people work is to
> go out hunting for things.

Sure, and without uniform support of features that are well-implemented in
browsers like Mozilla it's no wonder the users have to 'go hunting'.  I'd have
to imagine encouraging support of valuable and well-researched constructs would
do more to help than would stabbing around with blind URLs.

> While there are certainly going to be cases
> where someone is on your site and says "show me the feeds", I think it more
> likely that you'll see a reader of some sort going to get that information
> while browsing another feed, not looking up the data while actually viewing
> the page that contains the link tag.  That means pulling the entire index.
> Again, higher costs.

Certainly and an 'entire' index polled on the inane intervals some readers try
would have the site operators screaming bloody murder.  This would hardly help
advance RSS as a cause.

> Please note that I'm not necessarily arguing for a static file name.  I just
> don't seem to understand this as one of the arguments against a static file
> name and I'm trying to get it.  While I implemented both a link and the
> static name proposed by Dave on my site, I'm pretty undecided on which I
> actually prefer.  Luckily my bandwidth is low enough that I don't have to
> worry too much about getting killed with those costs.

The downside to forcing a static URL is it opens the door to blind polling of a
file that a) may not exist and b) might not contain enough (or contain too much)
data.  An index on LiveJournal, for example, would be hideously large.
Likewise, something naively stabbing around 'deep within' a site would come up
empty handed.

Right now an RSS feed possesses a URL pointing to the site's web page.  This is
a fine place to put in a link tag leading to more robust data.  Start from what
they're getting already and build on it.  Don't invent some out-of-bandwidth
mystery URL.  If they're using a flavor of RSS that supports it then consider
putting that URL into the feed file itself.  That's sort of wasteful in that
it'd add bytes to the feed that probably don't need scheduled downloading.

As an aside, too many feeds use a document URL to take the 'poke bindly'
approach on that URL.  While some sites might be
http://somesite.example.com/news/ and thus could conceivably use that as some
sort of 'base' for a blind stab, too many use a full path like
http://othersite.example.com/news/index.php or even gnarlier examples using
dynamic paramters.

> It sounds like what is needed is a method to determine what someone is
> looking for, be it a separate file containing just links to metadata of this
> sort or extensions to RSD or something.  Maybe even an interface that could
> be queried for the location of the data, and the interface could also tell
> the searcher if it wasn't configured, so that the one performing the search
> doesn't keep digging everywhere trying to find it if it doesn't get a hit in
> the first location.  Adding extra functionality to block that URL if it
> keeps searching anyway might be useful for some.

Sure, a templated file like that suggested by RSD would certainly be good.  The
question would be what mechanism should be promoted as a reliable way to find
that file?  Using a static filename needlessly restricts way too many
opportunities.  Thus using a link tag gives the most flexibility at the least
cost.  It's going to take /something/ to get the data out so let's not be naive
or quibble excessively on the bits and bytes.

Of course there's also the glaring absense of anything like ttl in an opml
file...  Or a serial number increment a la DNS zones.  So there's no reliable
way for a site possessing an index to let you know if/when you should come back
to check for an update.  Yet another nail in that coffin...

> In any case, only that single location is ever queried for anything in that
> scenario.  If it's not there, then no looking elsewhere is required.  Not
> saying this is an easy solution - but it sounds like the solution everyone
> is after.

The trouble is, where is that 'single location'?  Is it based on the FQDN
hostname only?  That's really less than ideal for large hosting environments,
isn't it?

-Bill Kearney
Syndic8.com
References:
- RE: [syndication] RFC: myPublicFeeds.opml
  - From: "Chad Everett" <yahoogroups@jayseae.cxliv.org>
Prev by Date: Re: [syndication] RFC: myPublicFeeds.opml
Next by Date: Re: [syndication] RFC: myPublicFeeds.opml
Previous by thread: Re: [syndication] RFC: myPublicFeeds.opml
Next by thread: Re: [syndication] RFC: myPublicFeeds.opml
Index(es):
- Date
- Thread