[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: site-wide metadata discovery



Bill Kearney said:

> I'd favor using a single directive inside robots that pointed to an
> external document that had all of what you're suggesting.

Then Chad Everett said:

> Another thought would be to use comments in the robots.txt file.  Something
> akin to embedding javascript in HTML comments.  Comments are widely
> accepted in robots.txt and shouldn't cause any harm.  Something like:
>
> #Public-Feeds: myPublicFeeds.opml
> #Format: http://www.opml.org/spec
> #Title: Public Feed List

Bill replied:

> I'd favor using a single directive inside robots that pointed to an
> external document that had all of what you're suggesting.
>
> The issues of breaking existing robots.txt parsers and running afoul of
> size restrictions, not to mention the inevitable technical arguments about
> "polluting the purity of robots.txt"
>
> And one unexpectly nice side effect:
>
> User-agent: *
> Disallow: /stupidFixedName.url

(and right about this time, I discovered that Kmail truncates long original 
messages when including in a reply... grrrr!)

Danny Ayers added:

> RDF Site Summary?
>
> (dives for cover...)

To which Bill replied:

> One example of how using RDF for this will actually be /smaller/ in size:
> http://www.syndic8.com/~wkearney/archives/000251.html

And folks, I think we've struck oil!  One <link> header, where available, 
pointing to an RDF file that hosts the full set of metadata about the site in 
question.  Back that up with a comment pointer in robots.txt for the sites 
where the admins work in restraints.

This consolidates all the metadata in one file for easy updating.  Repeating 
clients (aggregators) should cache the RDF filename and do conditional GET, 
so a client need only make one introductory fetch of robots.txt (or none, if 
the <link> is in the <head> of the default document).

It's elegant and as simple as possible, without being any simpler.

Oh, and for sites that don't return a usable document for '/', I guess they 
just don't want to play.