mnot’s blog

Design depends largely on constraints.” — Charles Eames

Thursday, 16 October 2008

/site-meta

Filed under: HTTP Standards Web Web Services

Metadata discovery is a nagging problem that’s been hanging around the Web for a while. There have been a few stabs at this problem (including at least one by yours truly), but no real progress.

This is both unfortunate and worrisome, because as the next generation of Web-based protocols informed by REST, Web 2.0 and the like roll out, they’re going to need a way to find and talk about metadata on the Web in an automated fashion.

And no, this isn’t a Semantic Web pitch. Sorry.

The immediate need is for XRDS-Simple; Eran wanted a way to find security metadata for a site, and in discussion we agreed that rather than re-inventing the heel for the nth time, we’d try to do it right, hopefully for the last time.

And so /site-meta was born; an ultra-simple, lightweight and minimally intrusive way to find a Web site’s metadata. E.g.,

<metadata>
  <meta href="/robots.txt" rel="robots"/>
  <meta rel="privacy" type="application/p3p.xml" href="/w3c/p3p.xml"/>
  <meta type="application/example+xml" rel="http://example.com/rel"
        href="http://other.example.net/example">
    <example-root xmlns="http://www.example.com">
      <!-- some metadata here -->
    </example-root>
  </meta>
  <meta type="text/example">
foo = bar
baz = bat
</meta>
</metadata>

Please have a read, note the FAQ at the end and send feedback to the www-talk list. I’m particularly interested in whether people think XML is the best choice here.


17 Comments

Anne van Kesteren said:

Having security policies domain wide does not seem like a good idea. See e.g. crossdomain.xml. Also, if you call it /site-meta Content-Type will be ignored most likely.

Thursday, October 16 2008 at 5:46 AM

Sean Hogan said:

Surely your past interaction with Anne is irrelevant to his comment here. Only saying it cos you haven’t addressed the issues raised… and that’s not consistent with your past writings (or what I’ve read)

Thursday, October 16 2008 at 9:01 AM

Caesar said:

Pardon me if this is a stupid question, but how does this proposal relate to something like WADL, if at all? If a WS-* (er, WS-DeathStar) analogy is possible is this like WS-MetadataExchange and WSDL)? Thanks for your time.

Friday, October 17 2008 at 8:05 AM

Toby Inkster said:

Oh no, not another fixed-URI specification. robots.txt and favicon.ico have between them been responsible for eleventy zillion lines in error logs across the web.

They also confuse the issue of a “site” and a “host name”.

Saturday, October 18 2008 at 7:30 AM

MikeMoran said:

I second Toby. I once helped write a product which presented an analysis of a ‘site’ to a user (broken links, hits, etc). The problem of course is that a site is often defined by a shared look and feel, rather than a shared domain. For example, a linked cart page may actually be a hosted page on a separate site.

Could the spec support something like this by defining a scope? Each one of the many roots would return the same /site-meta file and within each of these the scope would be a list of root URIs? Alternatively, all but one of the URLs could do a redirect to a canonical URL for /site-meta?

Saturday, October 18 2008 at 9:01 AM

Anne van Kesteren said:

FWIW, I was not asserting content sniffing is a good idea, I’m just saying that authors that without an extension the probability of authors and servers doing the right thing decreases and that therefore there might be an incentive for consumers to not properly check things (besides the fact that enforcing the media type is already an extra cost for consumers).

As for Access Control for Cross-Site Requests, there has been plenty of debate on why having a centralized file for a whole domain is a very bad idea. That the mailing list interactions have not always been friendly does not really change that. I agree that we should have done a better job with that. Sorry.

Monday, October 27 2008 at 11:41 AM

Anne van Kesteren said:

s/that authors //

Monday, October 27 2008 at 11:44 AM

Benjamin Carlyle said:

What this looks like to me is an alternative home page or launching point for machine users. As such I think it could make a reasonable amount of sense. It could always link to another set of urls to define the set of “sites” if you wanted to disambiguate the authority and site concepts.

The /site-meta page could even be linked from / by issuing a HEAD request to it that contains a rel link to the machine-focused page.

The use of meta xml elements sticks out at me a little bit. I can think of two obvious alternatives:

  1. Use the rel name as the element name
  2. Use an element name with prior art, eg link or a

Pro of (1)

  • XML processors using xpath or dom traversal will be able to express their query a little more simply Con of (1)
  • Only rel names that are legal as XML element names would be usable. You couldn’t use URLs… however this may not be a bad thing in terms of reaching consensus and avoiding namespace hell.

Pro of (2)

  • You could potentially avoid defining a new format completely. Why not just use HTML directly? I have read your FAQ on the use of a microformat on /, but is there any reason why you wouldn’t use a microformat on /site-meta?
  • Heading down the HTML path may allow a human to more easily debug this machine-oriented page. Con of (2)
  • If you start heading down the HTML path, it could become more complex for a machine to process.

eg: <html xmlns=…> <a href=”/robots.txt” rel=”robots”/> <a rel=”privacy” type=”application/p3p.xml” href=”/w3c/p3p.xml”/> <a type=”application/example+xml” rel=”http://example.com/rel” href=”http://other.example.net/example”> <example-root xmlns=”http://www.example.com”> <!– some metadata here –> </example-root> </a> <dl xml:id=”example”> <dt>foo</dt><dd>bar</dd> <dt>baz<dd>bat</dd> </dl> </html>

Saturday, November 15 2008 at 3:20 AM

Jon Hanna said:

I dislike “well-known” URIs as a mechanism generally. Happy enough with it as something that will be tried, but would much prefer if there was a way to use markup and/or HTTP headers to over-ride the location.

Thursday, November 27 2008 at 12:37 PM

Bill de hÓra said:

Mark,

replace the “meta” element with -> “link” and I think you have something. This is the kind of thing I was saying we needed for Data API sanity at the W3C’s mobile social conf and to the people working on activity streams . You don’t need to rewrite your APIs in RDF/semweb tech (RDfa, FOAF, etc) to make it sane, or agree a registry for activity verbs, you want to be able to make statements about your APIs and metadata and point to links.

Btw; if this helps XRDS go away, I’m all for that. XRDS* is an implementation of Greenspun’s 10th for RDF.

Saturday, February 7 2009 at 5:54 AM

Michael Hausenblas said:

Mark,

Two things: (1) I like the proposal and would love to learn more about its relation to POWDER (is it either or, etc.) and (2) I’m currently trying to gather material for a blog recently launched [1] so in case you can point me to some more resources in the metadata discovery area?

Cheers, Michael

[1] http://webofdata.wordpress.com/

Sunday, February 8 2009 at 12:54 PM

Creative Commons