Network Working Group M. Nottingham
Internet-Draft December 20, 2001
Expires: June 20, 2002
Publishing Site Metadata
draft-nottingham-http-options-metadata-00
Status of this Memo
This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC2026.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at http://
www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
This Internet-Draft will expire on June 20, 2002.
Copyright Notice
Copyright (C) The Internet Society (2001). All Rights Reserved.
Abstract
This note describes a means of making available metadata that
describes an entire Web site (or portions thereof), rather than
directly embedding metadata within resources' representations.
The method described does so without proscribing a "well-known
location" for the metadata; instead, it uses the OPTIONS HTTP request
method in conjunction with server-driven content negotiation.
Nottingham Expires June 20, 2002 [Page 1]
Internet-Draft Site Metadata December 2001
1. Introduction
An increasingly common requirement for Web technologies is to
describe metadata about a group of resources, rather than just a
single resource.
The most commonly deployed solutions to this problem involve defining
a 'well-known location" for a resource which describes a mapping of
metadata to resources.
For example:
P3P[2] uses the resource on the path '/w3c/p3p.xml' as a 'Policy
Reference File', which maps privacy policies to different portions
of the site.
The Robot Exclusion[3] convention uses the path '/robots.txt' to
direct automated agents as to which portions of the site are not
to be visited.
A recent proposal, WS-Inspection[4], uses a well-known location '/
inspection.wsil' to aid in the location of Web Services.
There are a number of reasons for the well-known location approach;
Scoping - by defining a single metadata source that is tied to the
URI authority, the metadata statements contained within can be
considered authoritative for that site. For example, the P3P
Policy Reference File at http://www.example.org/w3c/p3p.xml is
authoritative for www.example.org, because it is under the control
of www.example.org.
Efficiency - it is cumbersome to directly embed metadata in every
representation of a resource produced, both because of the
management overhead involved, and the byte bloat in the
representations themselves.
Flexibility - often, statements about a resource are separate in
time from its representations. Separating them allows them to be
changed without affecting the resources themselves.
Privacy - some metadata affects the way requests are made (or not
made), bringing the requirement that the metadata is known
beforehand. For example, the metadata in http://www.example.net/
robots.txt must be known before other requests can be made by a
robot, because it might preclude them.
However, use of a well-known location imposes the protocol designers'
Nottingham Expires June 20, 2002 [Page 2]
Internet-Draft Site Metadata December 2001
choice of identifiers into publishers' URI namespaces. The chosen
identifier may not be easy to make available, depending on the nature
of the server implementation, or it may be impractical to integrate
the well-known location into content management systems. [i18n
concerns?]
This document defines a mechanism whereby protocols can specify site
metadata without tying it to a well-known location, by using
mechanisms in the HTTP [1].
2. Site Metadata
A Site Metadata request uses the OPTIONS HTTP method in conjunction
with server-driven content negotiation. For example;
OPTIONS * HTTP/1.1
Host: www.example.com
Accept: application/p3p-prf+xml;q=1, */*;q=0
Here, the request identifies a media type, 'application/p3p-prf+xml'
which is the desired metadata payload.
Compliant responses will typically have an entity body containing the
requested representation. Compliant responses may also be a redirect
or similar message.
3. Falling Back to a Well-Known Location
Protocols may define a fallback well-known location which User Agents
can use if the origin server does not support this mechanism.
For example, a User Agent may first attempt the OPTIONS request
above, but receive a response which does not result in the desired
entity body. In this case, the protocol could define a fallback
well-known location, such as '/w3c/p3p.xml'.
4. Caching Site Metadata
Some metadata representations may be large, or be frequently
accessed. Because OPTIONS is not cacheable, it may be desireable to
return a redirect to another, cacheable resource in these situations.
5. Security Considerations
Statements made in site metadata representations may be modified in
transit, and are subject to the same risks as other Web resources.
Statments made in site metadata may or may not reflect the authority
of the author of a resource; there is no inherent way to determine
Nottingham Expires June 20, 2002 [Page 3]
Internet-Draft Site Metadata December 2001
the relationship between a resource author and a site's authority.
References
[1] Fielding, R., Gettys, J., Mogul, J., Frystyk, H., Masinter, L.,
Leach, P. and T. Berners-Lee, "Hypertext Transfer Protocol -
HTTP/1.1", RFC 2616, June 1999.
[2]
[3]
[4]
Author's Address
Mark Nottingham
EMail: mnot@pobox.com
URI: http://www.mnot.net/
Nottingham Expires June 20, 2002 [Page 4]
Internet-Draft Site Metadata December 2001
Full Copyright Statement
Copyright (C) The Internet Society (2001). All Rights Reserved.
This document and translations of it may be copied and furnished to
others, and derivative works that comment on or otherwise explain it
or assist in its implementation may be prepared, copied, published
and distributed, in whole or in part, without restriction of any
kind, provided that the above copyright notice and this paragraph are
included on all such copies and derivative works. However, this
document itself may not be modified in any way, such as by removing
the copyright notice or references to the Internet Society or other
Internet organizations, except as needed for the purpose of
developing Internet standards in which case the procedures for
copyrights defined in the Internet Standards process must be
followed, or as required to translate it into languages other than
English.
The limited permissions granted above are perpetual and will not be
revoked by the Internet Society or its successors or assigns.
This document and the information contained herein is provided on an
"AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Acknowledgement
Funding for the RFC Editor function is currently provided by the
Internet Society.
Nottingham Expires June 20, 2002 [Page 5]