mnot’s Web log

Design depends largely on constraints.” — Charles Eames

Thursday, 16 October 2008

/site-meta

Metadata discovery is a nagging problem that’s been hanging around the Web for a while. There have been a few stabs at this problem (including at least one by yours truly), but no real progress.

This is both unfortunate and worrisome, because as the next generation of Web-based protocols informed by REST, Web 2.0 and the like roll out, they’re going to need a way to find and talk about metadata on the Web in an automated fashion.

And no, this isn’t a Semantic Web pitch. Sorry.

The immediate need is for XRDS-Simple; Eran wanted a way to find security metadata for a site, and in discussion we agreed that rather than re-inventing the heel for the nth time, we’d try to do it right, hopefully for the last time.

And so /site-meta was born; an ultra-simple, lightweight and minimally intrusive way to find a Web site’s metadata. E.g.,

<metadata>
  <meta href="/robots.txt" rel="robots"/>
  <meta rel="privacy" type="application/p3p.xml" href="/w3c/p3p.xml"/>
  <meta type="application/example+xml" rel="http://example.com/rel"
        href="http://other.example.net/example">
    <example-root xmlns="http://www.example.com">
      <!-- some metadata here -->
    </example-root>
  </meta>
  <meta type="text/example">
foo = bar
baz = bat
</meta>
</metadata>

Please have a read, note the FAQ at the end and send feedback to the www-talk list. I’m particularly interested in whether people think XML is the best choice here.


Filed under: HTTP, Standards, Web, Web Services

discussion of this entry

Anne van Kesteren said…

Having security policies domain wide does not seem like a good idea. See e.g. crossdomain.xml. Also, if you call it /site-meta Content-Type will be ignored most likely.

Thursday, October 16 2008 at 5:46 PM +10:00

Mark Nottingham said…

Anne, if you have concerns about the security use case, I suggest you take them to the OAuth and XRDS-Simple folks; perhaps they'll pay you more attention that you and Ian paid me and others when we've brought architectural concerns to you in the past.

As far as the Content-Type - while there's a place for sniffing content, no one but you is asserting that sniffing file extensions on the Web is a good idea. I knew you were hostile to the Web architecture; I didn't realise it was this bad.

Thursday, October 16 2008 at 8:04 PM +10:00

Sean Hogan said…

Surely your past interaction with Anne is irrelevant to his comment here. Only saying it cos you haven't addressed the issues raised... and that's not consistent with your past writings (or what I've read)

Thursday, October 16 2008 at 9:01 PM +10:00

Mark Nottingham said…

Be that as it may, I answered his comment as well as it was posed. Waving one's hands about something "not being a good idea" and giving one example without any substantiation isn't raising an issue.

I'd try to engage in a dialogue with Anne to draw his objection out of him if past interaction hadn't proven so fruitless, and I weren't feeling so old and therefore jealous of my time.

That may sound harsh, but I think in the greater context of the interactions he and Ian having, it's actually quite mild. If you happen to be a W3C Member (or invited expert), go and have a look at the w3c-ac-forum archives recently; it's a fun read.

If he (or anyone else) would care to make a reasoned argument on a relevant point, I'd be more than happy to listen, digest and respond. Doesn't mean I'll agree with them, of course :)

Thursday, October 16 2008 at 9:20 PM +10:00

Mark Nottingham said…

P.S. Yes, my first response was very snarky. Forgive me for being human...

Thursday, October 16 2008 at 10:29 PM +10:00

Caesar said…

Pardon me if this is a stupid question, but how does this proposal relate to something like WADL, if at all? If a WS-* (er, WS-DeathStar) analogy is possible is this like WS-MetadataExchange and WSDL)? Thanks for your time.

Friday, October 17 2008 at 8:05 AM +10:00

Mark Nottingham said…

Caesar,

That's actually not a bad analogy, at the 50,000 foot level.

Cheers,

Friday, October 17 2008 at 9:54 AM +10:00

Toby Inkster said…

Oh no, not another fixed-URI specification. robots.txt and favicon.ico have between them been responsible for eleventy zillion lines in error logs across the web.

They also confuse the issue of a "site" and a "host name".

Saturday, October 18 2008 at 7:30 PM +10:00

MikeMoran said…

I second Toby. I once helped write a product which presented an analysis of a 'site' to a user (broken links, hits, etc). The problem of course is that a site is often defined by a shared look and feel, rather than a shared domain. For example, a linked cart page may actually be a hosted page on a separate site.

Could the spec support something like this by defining a scope? Each one of the many roots would return the same /site-meta file and within each of these the scope would be a list of root URIs? Alternatively, all but one of the URLs could do a redirect to a canonical URL for /site-meta?

Saturday, October 18 2008 at 9:01 PM +10:00

Mark Nottingham said…

"Site" is a colloquial name, the technical name for what we're scoping this to is (very appropriately) called an authority.

As Toby points out, there are already a number of specifications that ground authority at this level, so choosing a different approach actually creates more problems than it solves, because it makes things more complex.

As the FAQ briefly mentions, making scoping dynamic brings a number of inefficiencies and potential risks. However, there's nothing stopping you from referring to metadata on another site, thereby consolidating it there, or defining a more fine- (or coarse-) grained way to apply metadata once it's discovered.


Saturday, October 18 2008 at 10:12 PM +10:00

Anne van Kesteren said…

FWIW, I was not asserting content sniffing is a good idea, I'm just saying that authors that without an extension the probability of authors and servers doing the right thing decreases and that therefore there might be an incentive for consumers to not properly check things (besides the fact that enforcing the media type is already an extra cost for consumers).

As for Access Control for Cross-Site Requests, there has been plenty of debate on why having a centralized file for a whole domain is a very bad idea. That the mailing list interactions have not always been friendly does not really change that. I agree that we should have done a better job with that. Sorry.

Monday, October 27 2008 at 11:41 PM +10:00

Anne van Kesteren said…

s/that authors //

Monday, October 27 2008 at 11:44 PM +10:00

Benjamin Carlyle said…

What this looks like to me is an alternative home page or launching point for machine users. As such I think it could make a reasonable amount of sense. It could always link to another set of urls to define the set of "sites" if you wanted to disambiguate the authority and site concepts.

The /site-meta page could even be linked from / by issuing a HEAD request to it that contains a rel link to the machine-focused page.

The use of meta xml elements sticks out at me a little bit. I can think of two obvious alternatives:
1. Use the rel name as the element name
2. Use an element name with prior art, eg link or a

Pro of (1)
* XML processors using xpath or dom traversal will be able to express their query a little more simply
Con of (1)
* Only rel names that are legal as XML element names would be usable. You couldn't use URLs... however this may not be a bad thing in terms of reaching consensus and avoiding namespace hell.

Pro of (2)
* You could potentially avoid defining a new format completely. Why not just use HTML directly? I have read your FAQ on the use of a microformat on /, but is there any reason why you wouldn't use a microformat on /site-meta?
* Heading down the HTML path may allow a human to more easily debug this machine-oriented page.
Con of (2)
* If you start heading down the HTML path, it could become more complex for a machine to process.

eg:
<html xmlns=...>
<a href="/robots.txt" rel="robots"/>
<a rel="privacy" type="application/p3p.xml" href="/w3c/p3p.xml"/>
<a type="application/example+xml" rel="http://example.com/rel"
href="http://other.example.net/example">
<example-root xmlns="http://www.example.com">
<!-- some metadata here -->
</example-root>
</a>
<dl xml:id="example">
<dt>foo</dt><dd>bar</dd>
<dt>baz<dd>bat</dd>
</dl>
</html>

Saturday, November 15 2008 at 3:20 PM +10:00

Jon Hanna said…

I dislike "well-known" URIs as a mechanism generally. Happy enough with it as something that will be tried, but would much prefer if there was a way to use markup and/or HTTP headers to over-ride the location.

Thursday, November 27 2008 at 12:37 AM +10:00

Mark Nottingham said…

John Panzer has some additional thoughts here:
http://www.abstractioneer.org/2008/11/one-site-meta-to-rule-them-all.html

(more of a note to myself so I don't forget than anything)

Thursday, November 27 2008 at 12:29 PM +10:00

add to the discussion

your details

name
e-mail address

Your e-mail address will not be shared.

your comment

Separate paragraphs with blank lines; HTML markup will be removed.

By submitting a comment, you agree to grant a limited license to reproduce it, under the same terms as the page being commented upon. If you have questions or prefer other terms, please contact me.

Creative Commons License