mnot’s blog

Design depends largely on constraints.” — Charles Eames

Sunday, 3 April 2005

A Call to OPTIONS

Web metadata discovery is not a new topic, and one on which the final word has not been spoken. However, one of the most basic means of discovering something about a resource, the HTTP OPTIONS method, is not widely enabled by current implementations.

I’m immediately interested in this because it would be nice to use it in conjunction with POE, but this isn’t the only use case; things like privacy policies, robot policies, configurations and site descriptions need to be discovered as well.

I have a suspicion that whatever happens, OPTIONS will be part of the solution, so the fact that there’s no safe way to walk up to a Web site or resource and ask what it can do isn’t good.

I made a proposal in this space a while back, but abandoned it when it became clear that there weren’t good controls for OPTIONS in Web servers at the time. Here are some of my notes about the particulars, updated for recent versions of Apache and IIS.

Requirements

There are a few different aspects of OPTIONS that would be useful to control.

OPTIONS has a special mode, “OPTIONS *”, that lets clients ask servers for server-wide metadata, such as the extensions it supports and so forth. This could be a key way of discovering “site” metadata. Unfortunately, as far as I know no widely-used Web server that allows you to control the response to OPTIONS * in any meaningful way without writing your own extension, so I won’t cover it further below.

OPTIONS can also be used against individual resources to discover things about them; e.g., “OPTIONS /example/index.html”. Things that it would be useful to control here include:

  1. Adding an arbitrary HTTP response header (e.g., “Foo: bar”) so that you can appropriate metadata (as RFC2616 suggests).
  2. Controlling the content of the Allow HTTP response header (e.g., “Allow: GET, HEAD”), so that you can advertise what methods the resource supports. This one is critical for POE.
  3. Accessing the request body and controlling the response body’s content, so that you can implement negotiation for site and resource descriptions like URISpace.

Apache

Apache has a number of OPTIONS-related problems. Although it’s possible to set an aribtrary header on OPTIONS responses in most conditions (with mod_headers), that’s about all it can do reliably, and even doing that is incovenient, requiring developers to muck around with server configuration.

They would be able to do it (and more) in CGI scripts if this bug were addressed. Basically, mod_cgi makes an arbitrary decision about what methods CGI scripts can handle, regardless of what they can actually do. Unfortunately, the bug has been around for a while, with patches, and it still hasn’t made it into the Apache mainline, so I’m not holding my breath.

Even less is possible for the Allow header. There seem to be bugs in both mod_php and mod_dav which make them over-aggressive about populating the Allow header. While it’s good that there’s infrastructure in Apache that allows different modules to say what methods they’ll handle, both PHP and DAV say that they can handle any method, including those that they don’t, such as POST.

As an experiment, turn on mod_php4 on your server and send an OPTIONS request to something that gets the text/html handler (such as a file ending in .html). You’ll get a very expansive OPTIONS back. I think the cause is this code in both the PHP 4 and 5 Apache modules, which basically tells the Apache server that they can handle any method on text/html, even if it’s not processed by PHP!

This means that if you have PHP installed (it doesn’t even able to be enabled by turning the engine on!), you’re not going to be able to control the Allow header’s content, and if you have DAV enabled on a resource, you won’t be able to affect OPTIONS responses at all.

On the bright(-ish) side, mod_php4 passes everything — including OPTIONS — to the PHP script, which means that you have full control over all aspects of the response. This is a double-edged sword; although it makes it easy to handle OPTIONS in your script, if you write a script that isn’t OPTIONS-aware, it’s likely to treat OPTIONS like GET, which can be problematic. It would be nice if PHP made you explicitly tell it you were handling OPTIONS before it trusted you to do so.

I explored a few other options for OPTIONS in Apache 1.3, including Script in mod_actions, which seemed promising, but because it only sets a default, it doesn’t really do anything for OPTIONS, which already has a default handler. Playing around with mod_rewrite, redirection and similar mechanisms hasn’t done anything yet.

IIS

IIS seems to be more promising, although I haven’t dug as deeply as I did with Apache. The server configuration allows you to specify what HTTP methods individual handlers can work with. As a result, I can configure Python CGI (for example) to take all methods, gaining control of both the headers and the response body.

It also allows you to add a “custom HTTP header” to all responses, including those to OPTIONS, on a server-wide or more constrained basis.

I haven’t tested PHP on IIS, but ASP acts in a manner that’s similar to PHP on Apache; assuming the server is configured to send OPTIONS to the ASP script, the ASP script will be able to handle it just like any other request. This has the same benefits and risks as does this situation with PHP on Apache.

So, it appears that IIS does give some reasonable control of OPTIONS response headers and bodies, although it does once again require some fiddling with the server’s configuration. The problem is that it’s all-or-nothing; if you delegate a method to a handler, that handler has to know everything it needs to respond. For example, if you delegate OPTIONS to asp, you’ll need to send back the appropriate methods for supporting WebDAV in Allow; the server won’t, even though it’ll be the one handling the WebDAV requests.

Conclusions and Further Work

From the standpoint of someone who wants to define protocols that use OPTIONS, the situation is pretty grim. The potentially most useful variant, OPTIONS *, can’t be affected without writing a server plug-in, and working with the body of requests and responses, as well as the Allow header, is touch-and-go. The only thing that you can really rely upon at this stage is being able to set an arbitrary (i.e., non-managed) HTTP response header, albeit often this will have to be through server configuration interfaces, rather than directly by the resource of interest.

Improving this situation is going to require a multi-pronged approach. In the short term, a number of small-ish bugs need to be fixed in mod_cgi, mod_dav and mod_php and actually shipped. I just filed the latter two tonight, but the first one has been sitting around for more than two years.

I also think that we need some discussion about Web servers’ architecture with regard to the layering of features like WebDAV; ideally, I’d like to be able to have a module handle WebDAV, but still have local control over the Allow header, other headers, and the response body.

Finally, I’d really like to see support for customisation of OPTIONS * in Web servers; i.e., being able to do complex queries in request bodies, pass that off to a handler, content negotiate the response body, add headers and control Allow, all whilst integrating the right thing from mod_dav and other interested extensions. That’s a bigger job than a blog post, though.

P.S.For similar thoughts regarding server support for caching, see my survey of Web server capabilities. Although it’s quite old, I doubt much has changed, unfortunately.


Filed under: Web

8 Comments

Ryan Tomayko said:

This is good. There's a whole range of HTTP functionality that is unused or misused (content negotiation, If-Modified-Since, 303s, authentication, PUT, DELETE, etc). I've always assumed that these bits were being neglected because people perceived them as not being valuable but I think you just hit a switch for me. The reason most of these features aren't being used is because the tools don't make them easy to use, or in this case, make them impossible to use.

Moving up the stack from apache and IIS to frameworks like Java Servlets, ASP.NET, Webware, Quixote, Rails, etc. you just don't see a lot of inherit support for this type of functionality.

I lurked on the atom list and thought the discussion around whether to use POST or PUT for various tasks was interesting because in many cases PUT was obviously the correct thing to do, while POST was the only practical thing to do because the tools (servers *and* browsers) get all squirmy when you start talking about something as fundamental as PUT. Crazy.

The REST crowd should take your lead and start working to address the fundamental issues in our frameworks. Let's stay out of spec and theory land for a little bit and get our tools fixed up.

Sunday, April 3 2005 at 10:28 PM +10:00

Mark Nottingham said:

“Let’s stay out of spec and theory land for a little bit and get our tools fixed up.”

Amen. The tools are in a horrible state; between servers forcing people to use proprietary and user-unfriendly configuration interfaces, not supporting a lot of key features (like this), and encouraging bad practice (like servlets), there’s a lot of work to do.

Sunday, April 3 2005 at 10:35 PM +10:00

Jim Dabell said:

I think you are overestimating the importance of OPTIONS *. It's common to see multiple websites sharing the same hostname. If you use OPTIONS * to get site metadata, you are saying hostname == website, which would fall down frequently.

It's probably another implementation problem that websites share hostnames, but it's not one that's readily fixable, so given that OPTIONS * already has problems, I think it's only safe to assume OPTIONS /path is the only adequate method of obtaining metadata about a website.

Perhaps if there is a conflict between resource metadata and website metadata, we can use the arbitrary header approach - something like:

OPTIONS /
Host: www.example.com
Scope: website

...and:

OPTIONS /
Host: www.example.com
Scope: resource

Monday, April 4 2005 at 7:36 AM +10:00

Mark Nottingham said:

There are some cases where that may be true (e.g., geocities), but I’m not sure they should be accommodated; it would require that the infrastructure recognise the subdivisions of control inside a hostname (there’s a reason it’s called the ‘authority’ in the URI); until we get widespread trust on the Semantic Web, doing so isn’t really worth the returns.

OTOH, it's much easier to hide the boundries inside the site; remember that it’s possible to aggregate a number of sub-site configurations into a master configuration, and then advertise that. I’ve seen this happen before with configurations advertised for the benefit of proprietary intermediaries.

Current practice supports this; /robots.txt is widely used, as is /w3c/p3p.xml. WSIL defines a similar, hostname-bound convention.

That said, I’d be thrilled if servers offered good support for per-resource OPTIONS; OPTIONS * would just be icing on the cake at this point.

Monday, April 4 2005 at 8:51 AM +10:00

Yves said:

Mark,
The concept of servlet is not broken, as it defines a common way to extend Web Servers.
The major issue is "just" that the servlet specification has been developped with a specific server and processing model in mind, and it just doesn't match well with servers that defined a completely different processing model.
Desiging a common API should NEVER lock people down.

You know already that I would LOVE to have OPTIONS used more (and used well :) ).

Tuesday, April 5 2005 at 12:06 AM +10:00

msd said:

I ran onto your discussion whilst looking for pointers to the undocumented extensions Micro$oft has made to HTTP headers in order to intromit CIFS file sharing over an HTTP transport.

These additions use, among other things, the OPTIONS request.

BTW: any pointers to detailed info on these additions would be
helpful (at least to me).

TIA,

Marc

Thursday, April 14 2005 at 2:30 AM +10:00

Jonathan said:

Thanks, Mark for your insight into Apache's OPTIONS related problems! I spent more than a day fiddling with Apache's Limit and LimitExcept directive's all over my config file...with no luck on changing the OPTIONS output of the Allow header.

I can stop pulling my hair out over Apache config directives now. I know that php is to blame for this over-population.

I followed the link to your php bug report and read the comments there as well...now that Im a little more informed, my question to you is:

Have you tried editing php's source code and applying Rasmus' "AAPI cleanup" method? (Sorry for the long link)

http://bonsai.php.net/bonsai/cvsview2.cgi?diff_mode=context&whitespace_mode=show&file=mod_php3.c&branch=&root=/repository&subdir=php3&command=DIFF_FRAMESET&rev1=1.50&rev2=1.51

Any followups on the effects of Rasmus' php patch after a recompile?

Thanks again for the post, you sure helped clear up some of the frustrations I was having!

Friday, June 17 2005 at 5:53 PM +10:00

Mark Nottingham said:

Jonathan,

Thanks for the kind words. I’m not sure what you mean by the “AAPI cleanup” method; the link you give seems to be the original patch that caused the problem. Perhaps you mean backing that out?

If so, I haven’t tried it; please report if you do!

Monday, June 20 2005 at 12:43 PM +10:00

Creative Commons