[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

discovery vs information



More thoughts on mypublicfeeds.

- I wonder if it would help to separate the discovery part of this from the format part and try and solve these independently.

- I was thinking about use cases. A classic one for me is this. I was looking at Zdnet UK, wireless section. (http://news.zdnet.co.uk/communications/wireless/). I'm sure I remember reading that there was some RSS on zdnet somewhere but I don't know where, so what should I do next? There's no obvious XML link on the visible page. So do I "view source" and look for <link> and/or start looking for likely files containing lists in:-
http://news.zdnet.co.uk/communications/wireless/
http://news.zdnet.co.uk/communications/
http://news.zdnet.co.uk/
http://www.zdnet.co.uk/

In the absence of a *really widely* implemented standard like robots.txt looking for files will just give me loads of 404s. Incidentally, there's no robots.txt or favicon.ico in any of those zdnet directories either.

- There's enough different reasons for having lists and enough different things to list that it feels to me like we need to solve this generally and not just for rss. Even for rss, I feel the need to spec which flavour each entry refers to.

- I think I've got three or four things I'd like to put in the header of every html file. They're all optional and there might be more than one of each.
1) Here's a pointer to single alternate version of the same content
2) Here's a pointer to a machine-readable file containing lists of alternate versions of this content or related content. eg RSS0.92, RSS1.0, RSS2.0, Atom, WML, Author's FOAF, Assorted metadata 3) Here's a pointer to a machine-readable file containing lists of files related to this section of the site 4) Here's a pointer to a machine-readable file containing lists of files related to this whole site.

1) needs more work on identifying the type of the target file. Not type as in text/plain vs text/xml, but type as in RSS0.92 vs FOAF vs Atom

2), 3) and 4) need work on the markup approach and standard. I don't think any of RDFS:seeAlso, OPML or OCS are actually good enough or complete enough yet. If this is going to be general it needs to solve a whole load of cases now and it absolutely must handle new file types. And it had better be really simple to parse and produce too. There's going to be a temptation to start adding all sorts of meta-meta-data about each entry. Please resist this. It should be a simple list of file type, name, URI. Any additional meta-data about each file should be contained in the files themselves and available by collecting them.

3) and 4) are actually about metadata describing the directory structure. I think there is a case here for the W3C to come up with a standard way for this to be found and created. If they haven't already. It feels like there's a case for a standard file with a standard name in each directory. This would actually help robots because it could contain sitemap lists of pages.

However, I think we actually already have a standard here. And that's that web requests to directories with no file name should return something via http and with a mime type. Either a web page, an index, a 404, a graphic or whatever. Now if the returned doc is of type text/html then we're back to <link>

Aside: I wonder if creating new http headers is out of the question? ;-)

So. I think we need the following:-
1) Some standards for specifying target file content type in <link>
2) As well as RSSx.xx, Atom, NITF, FOAF etc, create some content types for cases 1,2,3,4 above.
3) A defined way of creating new content types.
4) A standard file format for lists of <links> This needs to include a section which is metadata about this list. We can probably do this in a way that the entries can be inserted into all sorts of other types of files.

Once we've got this far, then we can move to stage 5) viz. evangelising; writing toolkits; writing apps; writing validators; arguing about common locations and file names; arguing about whether it should be xml or RDF or both; arguing about what it all means; and all that other stuff we're so good at.

This seems to me to be a bare minimum. Once we've done that if the fixed filename camp want to create these files with a fixed filename on their webservers, then they can go ahead. Just as long as they do <link> too. The fixed filename standard can then succeed or fail on it's own merits without killing the file format standard in the process.

--
Julian Bond Email&MSM: julian.bond@voidstar.com
Webmaster:              http://www.ecademy.com/
Personal WebLog:       http://www.voidstar.com/
M: +44 (0)77 5907 2173   T: +44 (0)192 0412 433