Sunday, 3 April 2005
A Call to OPTIONS
Web metadata discovery is not a new topic, and one on which the final word has not been spoken. However, one of the most basic means of discovering something about a resource, the HTTP OPTIONS method, is not widely enabled by current implementations.
I’m immediately interested in this because it would be nice to use it in conjunction with POE, but this isn’t the only use case; things like privacy policies, robot policies, configurations and site descriptions need to be discovered as well.
I have a suspicion that whatever happens, OPTIONS will be part of the solution, so the fact that there’s no safe way to walk up to a Web site or resource and ask what it can do isn’t good.
I made a proposal in this space a while back, but abandoned it when it became clear that there weren’t good controls for OPTIONS in Web servers at the time. Here are some of my notes about the particulars, updated for recent versions of Apache and IIS.
There are a few different aspects of OPTIONS that would be useful to control.
OPTIONS has a special mode, “OPTIONS *”, that lets clients ask servers for server-wide metadata, such as the extensions it supports and so forth. This could be a key way of discovering “site” metadata. Unfortunately, as far as I know no widely-used Web server that allows you to control the response to OPTIONS * in any meaningful way without writing your own extension, so I won’t cover it further below.
OPTIONS can also be used against individual resources to discover things about them; e.g., “OPTIONS /example/index.html”. Things that it would be useful to control here include:
- Adding an arbitrary HTTP response header (e.g., “Foo: bar”) so that you can appropriate metadata (as RFC2616 suggests).
- Controlling the content of the Allow HTTP response header (e.g., “Allow: GET, HEAD”), so that you can advertise what methods the resource supports. This one is critical for POE.
- Accessing the request body and controlling the response body’s content, so that you can implement negotiation for site and resource descriptions like URISpace.
Apache has a number of OPTIONS-related problems. Although it’s possible to set an aribtrary header on OPTIONS responses in most conditions (with mod_headers), that’s about all it can do reliably, and even doing that is incovenient, requiring developers to muck around with server configuration.
They would be able to do it (and more) in CGI scripts if this bug were addressed. Basically, mod_cgi makes an arbitrary decision about what methods CGI scripts can handle, regardless of what they can actually do. Unfortunately, the bug has been around for a while, with patches, and it still hasn’t made it into the Apache mainline, so I’m not holding my breath.
Even less is possible for the Allow header. There seem to be bugs in both mod_php and mod_dav which make them over-aggressive about populating the Allow header. While it’s good that there’s infrastructure in Apache that allows different modules to say what methods they’ll handle, both PHP and DAV say that they can handle any method, including those that they don’t, such as POST.
As an experiment, turn on mod_php4 on your server and send an OPTIONS request to something that gets the text/html handler (such as a file ending in .html). You’ll get a very expansive OPTIONS back. I think the cause is this code in both the PHP 4 and 5 Apache modules, which basically tells the Apache server that they can handle any method on text/html, even if it’s not processed by PHP!
This means that if you have PHP installed (it doesn’t even able to be enabled by turning the engine on!), you’re not going to be able to control the Allow header’s content, and if you have DAV enabled on a resource, you won’t be able to affect OPTIONS responses at all.
On the bright(-ish) side, mod_php4 passes everything — including OPTIONS — to the PHP script, which means that you have full control over all aspects of the response. This is a double-edged sword; although it makes it easy to handle OPTIONS in your script, if you write a script that isn’t OPTIONS-aware, it’s likely to treat OPTIONS like GET, which can be problematic. It would be nice if PHP made you explicitly tell it you were handling OPTIONS before it trusted you to do so.
I explored a few other options for OPTIONS in Apache 1.3, including Script in mod_actions, which seemed promising, but because it only sets a default, it doesn’t really do anything for OPTIONS, which already has a default handler. Playing around with mod_rewrite, redirection and similar mechanisms hasn’t done anything yet.
IIS seems to be more promising, although I haven’t dug as deeply as I did with Apache. The server configuration allows you to specify what HTTP methods individual handlers can work with. As a result, I can configure Python CGI (for example) to take all methods, gaining control of both the headers and the response body.
It also allows you to add a “custom HTTP header” to all responses, including those to OPTIONS, on a server-wide or more constrained basis.
I haven’t tested PHP on IIS, but ASP acts in a manner that’s similar to PHP on Apache; assuming the server is configured to send OPTIONS to the ASP script, the ASP script will be able to handle it just like any other request. This has the same benefits and risks as does this situation with PHP on Apache.
So, it appears that IIS does give some reasonable control of OPTIONS response headers and bodies, although it does once again require some fiddling with the server’s configuration. The problem is that it’s all-or-nothing; if you delegate a method to a handler, that handler has to know everything it needs to respond. For example, if you delegate OPTIONS to asp, you’ll need to send back the appropriate methods for supporting WebDAV in Allow; the server won’t, even though it’ll be the one handling the WebDAV requests.
Conclusions and Further Work
From the standpoint of someone who wants to define protocols that use OPTIONS, the situation is pretty grim. The potentially most useful variant, OPTIONS *, can’t be affected without writing a server plug-in, and working with the body of requests and responses, as well as the Allow header, is touch-and-go. The only thing that you can really rely upon at this stage is being able to set an arbitrary (i.e., non-managed) HTTP response header, albeit often this will have to be through server configuration interfaces, rather than directly by the resource of interest.
Improving this situation is going to require a multi-pronged approach. In the short term, a number of small-ish bugs need to be fixed in mod_cgi, mod_dav and mod_php and actually shipped. I just filed the latter two tonight, but the first one has been sitting around for more than two years.
I also think that we need some discussion about Web servers’ architecture with regard to the layering of features like WebDAV; ideally, I’d like to be able to have a module handle WebDAV, but still have local control over the Allow header, other headers, and the response body.
Finally, I’d really like to see support for customisation of OPTIONS * in Web servers; i.e., being able to do complex queries in request bodies, pass that off to a handler, content negotiate the response body, add headers and control Allow, all whilst integrating the right thing from mod_dav and other interested extensions. That’s a bigger job than a blog post, though.
P.S. For similar thoughts regarding server support for caching, see my survey of Web server capabilities. Although it’s quite old, I doubt much has changed, unfortunately.