[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [syndication] Re: feed list format and such



> > ...  every site has a root document
> > and they are all named '/'.
>
> Not every site has content under /.
> Some sites redirect to another site, some even may return a 403 or 404 error.

I think Roy's point was that it's common behavior that a request to
http://something/ will return a document.  That is, if something /can/ be
returned.

As you rightly point out that request may well 4xx if it's not available or 3xx
if other actions should be taken.  Leaving out the insanity of internal HTML
head meta redirects, of course.

The question is, perhaps, IF the default '/' domain is expected to be able to
return /something/ is it likewise reasonable expect it to similarly return a
/robots.txt?

What we're aiming for here is a way to allow sites that have content to express
data that helps FIND that content.  If a site has data and they want to play
along with this ease-of-use angle they're going to have to expend SOME sort of
effort.

If they can, part of the plan is to encourage them to use HTML <head> section
<link> tags.  Barring their ability to do that we're investigating the use of a
robots.txt entry to contain the same sort of data.

This to do multiple things:
    avoid raising errors in web site logs
    make it extensible
    not lock it to a fixed URL for the data itself.
    provide best practice examples on doing it

The value of the first point cannot be underestimated.  I've gotten countless
mails from website administrators asking what the hell are all these requests
digging around for rss.xml, index.xml, index.rdf and the like.  They're first
impression about RSS is often a bad one.  This is not something we want to
perpetuate.

If we hijack use of robots.txt this traffic won't show up as totally unexpected.
It may show up as /increased/ but it's won't be something new and unusual.
Likewise we also have the opportunity to slap the abberant abusers of this
concept by noting they've seen robots.txt and need to abide by it's /other/
directives as well.

Not using a fixed URL and making it extensible are hand-in-hand.  They save us
from the further stupidity of reinventing what's /wrong/ with favicon.ico.  The
also save us from the unacceptable tyranny of a bad format being over-hyped into
becoming a horrendous legacy problem.  I'm all for using a format that works
with existing tools but not if it hamstrings those tools against future and
intelligent growth.

Once we resolve these issues we've got a number of prominent sources eager to
hop on the bandwagon.  This will help them, of course, but if RSS gets to go
along for the ride, so much the better!

-Bill Kearney
Syndic8.com