Why Do Web Server APIs Suck So Much?

Monday, 8 December 2003

HTTP provides considerable benefits to Web applications that take advantage of it; everything from scalability (through caching), client-integrated authentication, automated redirection, multiple format support and lots more.

I’ve been drafting some entries about how cool all of these things are; I’m going to try to get a few up in the coming weeks. As I’ve been writing, however, I’ve noticed a common thread — just about every one of them is difficult to realise using existing server-side technology.

Web Metadata

For example, one of the biggest problems we found in the caching world was the inability of content authors to effectively set appropriate caching metadata for their resources. Most servers provide some mechanism, of course, but none of them are always available, usable to the average person and standardised beyond a single product.

Folks writing RSS aggregators, to give another example, can’t rely on the media type being set for RSS files, because the process for associating a new media type with content is so Byzantine on most servers (if the person who needs to do it has access at all). As a result, they can’t rely on content negotiation working, and can’t fully leverage the Web infrastructure.

The list goes on. Redirection is really simple, but I’d wager that 90% of intentional redirection on the Web happens through META refresh HTML elements, not HTTP redirection. The only clients that understand and follow them, then, are Web browsers, not the bulk of automated agents (where automated redirection does the most good).

All of this adds up to people not being able to count on the availability of mechanisms to set Web metadata, and therefore a failure to use what the Web provides. Take a look at Web applications like Wikis, Blog engines and commercial packages that you deploy on a Web server (I don’t want to pick on anyone particular here, because everybody’s in the same boat, and it’s not their fault).

URIs

The problem isn’t limited to setting metadata, either. URIs are the lynchpin of the Web; to get the full value of the Web infrastructure, you need to be able to identify every interesting part of your Web application with a URI. Unfortunately, common Web APIs don’t encourage this, or even actively discourage it.

For example, one of the most prevalent server-side APIs for HTTP (and therefore REST, for most people) — the Java Servlet API — does things backwards. It dispatches requests first to the HTTP method, and then has the application handle the URI. For example, a Python BaseHTTPServer handler (which has roughly the same API) for an imaginary address book might look like this:

class ResourceHandler(BaseHTTPServer.BaseHTTPRequestHandler):
    def init(cls):
        ...
    def do_GET(self):
        if self.path == '/':
            # return the home page
        if self.path == '/add':
            # return a form to add an entry
        elif self.path == '/search':
            # return a search form
        else:
            # return an entry's page
    def do_POST(self):
         ....

This stuffs a number of URIs (and therefore resources) into a single container, making it difficult to model an application as the transfer of state. Now imagine doing it the other way around; dispatching based upon URI to a different object, and then to a method based upon the HTTP method;

class FrontPage(Resource):
    def GET(self, request):
        ...
class AddForm(Resource):
    def GET(self, request):
        ...
    def POST(self, request):
        ...
class SearchForm(Resource):
    def GET(self, request):
        ...
class EntryPage(Resource):
    def GET(self, request):
        ...
    def DELETE(self, request):
        ...
    def PUT(self, request):
        ...

Isn’t that a much more natural way of writing a Web application, leveraging both the Web infrastructure and the good practices surrounding object-oriented programming? Even better, it might just steer people from creating Web sites where everything interesting is stuffed behind a single URI with a bunch of query parameters and a POST.

What Next

I do have ideas about how to fix this; this isn’t all whinge, like my rant about XML editors. I’ve started talking about some of it (e.g., Tarawa, which takes the approach to URIs outlined above), and will go into the rest over time, but more importantly I want to highlight the issues.

13 Comments

Leigh Dodds said:

While I don’t disagree that there could be better server-side APIs, I think you’re mis-characterising the Servlet API somewhat.

While many developers do use the idiom that you describe (handling multiple URIs from a single servlet impl) and in fact the idiom is often encouraged by a Front Controller based design, the API itself doesn’t require this.

It’s perfectly valid to bind different servlet implementations to different URIs using appropriate servlet-mapping entries in web.xml. This looks much more like your second example.

Tuesday, December 9 2003 at 2:00 AM

Mark Baker said:

Yes, I was just going to say the same thing, Leigh. When I use Servlets, I’m constantly extending HttpServlet to be a domain object and/or container. It’s very natural. I didn’t even consider that it could be used the other way, but perhaps that’s just me.

Tuesday, December 9 2003 at 6:51 AM

Mark Nottingham said:

Interesting. I’m not surprised you guys do this, but I wonder how prevalent it is out in the wild? Looking at URIs I see on sites, it’s pretty common practice to abuse Servlet in this fashion; why not have tools that people are inclined to use the correct way? If we designed screwdrivers and hammers this way, they’d look like you needed to grab the wrong end to use them…

Tuesday, December 9 2003 at 8:28 AM

tom said:

Using a controller servlet is not abuse. It keeps coupling between pages to a minimum by controlling the direction of each request in a single place. Using the latter method, if a site structure were to change, multiple other servlets would have to to change to reference to the new (or removed) pages. Using a single controller means all request routing is done from a central location and therefore changes have a minimum impact to the internal navigation logic.

Friday, April 22 2005 at 2:35 AM

Jeoff Wilks said:

One disadvantage of extending HttpServlet when writing a domain object is that your domain object is now coupled to a servlet container. I think that’s why you see a proliferation of Controller-type servlets trying to match up requests with domain object methods.

Saturday, April 23 2005 at 2:50 AM

Mark Baker said:

Hmm, not following Jeoff. Extending HttpServlet ties me to HttpServlet only AFAICT. I’ve used the same servlet in both Jetty and Tomcat, for example. Perhaps there’s some implementation details that may leak through in some cases, but I haven’t seen them. Do you have a specific example of how I’d be tightly coupled to a container?

Sunday, April 24 2005 at 10:01 AM

Jeoff Wilks said:

What I meant is, you’re coupled to servlet containers. If you later wanted to manipulate the same domain objects from a command line interface, or RMI, or wrap them as an IRC bot, you’d have to tear away the servlet-specific stuff. Developers often like to think of their domain objects as being separate from any specific implementation. So the servlet (and any associated controllers, action classes, etc. depending on the framework) become glue code to tie HTTP and OOP together. Tying domain objects to a servlet implementation can simplify code a lot, but you have to be willing to commit to it.

Monday, April 25 2005 at 6:16 AM

Mark Baker said:

Ok, understood. Well, I have no problem at all depending upon HttpServlet. I haven’t had to build alternate interfaces to them, but I’d imagine that if I did it would be through HTTP, e.g. scripting the object with wget/curl, etc…

Monday, April 25 2005 at 10:10 AM

Donovan Preston said:

I have always wondered why so many web programmers structure their URL code in such a braindead way. I think it must be CGI’s fault, the idea of treating each URL as a “file” which gets “executed”.

I find URL traversal to be so fundamental that my own web framework, Nevow, spends a lot of time trying to make the API for traversal as flexible and transparent as possible. These are two opposing goals, but they are accomplished by making the traversal interface as thin as possible (a single method, locateChild) while at the same time layering more child location conventions on top of this interface in the form of a default implementation which looks in various other places (childFactory and child_* attributes). More information is located here:

http://srid.bsdnerds.org/hacking/nevowdoc/nevow-traversal.html

When trying to explain object traversal to people, I have found that there are those who instantly get it, and those that seem like they never will understand it. I think the difference has to do almost entirely with programming background.

My own web programming background started on a web server that was implemented as part of a MOO (on E_MOO, in about 1995), and it included these ideas. The MOO is a natural environment for object publishing, since it uses an object database with a concept of containment. Later in my programming career, after staying away from the web for a while, I used Zope heavily for a while. Zope also has an object database and translates URLs into traversals over attributes, but I think Zope is both too thin (magical) and too heavyweight at the same time, so I moved on to creating my own framework. Zope also makes the mistake of attempting to do method publishing (exposing named methods of objects on URLs) which makes it difficult to cleanly separate the GET, POST, PUT, and DELETE, as you have done above.

Anyway, I’m glad more people realize the URL is an important structural tool.

Tuesday, May 3 2005 at 10:00 AM

Jimmy Jones said:

The real problem is web developers using technology that they do not have a good understanding of. Many people have taken classes in a programming language and take on web programming thinking “I know how to [pick favorite common web error], I programmed in [pick favorite language].”

There is only a small percentage of websites who use HTML the way it was intended to be used. By using a technology how you, the programmer, sees fit; is not by any means a good answer for anything.

If the technologies are applied correctly, most technologies will allow you to have some advantage over doing everything by hand. All technologies have their advantages and disadvantages. Web Server API’s are no different from any other technology.

Wednesday, February 7 2007 at 7:25 AM

hassan said:

i need the program or software( like webserver stress) to testing the server-client by user = 50 , please it’s very important me

Tuesday, December 11 2007 at 5:12 AM

Justin Sheehy said:

Of course, a number of Web servers have come a long way since this was originally posted, but I’d claim that most of them still suck.

Our attempt at a Web server with an API that doesn’t suck has worked out nicely for the people using it so far:

http://blog.therestfulway.com/2008/09/webmachine-is-resource-server-for-web.html

Sunday, October 5 2008 at 6:28 AM

Tim said:

Google App Engine (heavily inspired by web.py) has the mapping system you’re describing. The thing I like most about this approach is that it allows for decoupling, since the object name doesn’t need to match the uri. of course, it also means that you can not expose methods dynamically. but that’s a small price to pay for all the benefits it provides.

Saturday, October 25 2008 at 9:30 AM

mark nottingham

other HTTP APIs posts