Optimising Web Services with Intermediaries

Mark Nottingham <mnot@akamai.com>

Abstract

Intermediaries are often used to scale the Web infrastructure, offering improved performance, availability and scalability of services. This paper outlines an approach to scaling Web Services (services exposed by XML messaging solutions like SOAP [SOAP] and XML Protocol [XMLP]) through optimisation in intermediaries, and proposes further work which leverages XML Protocol's features to help scale them and improve performance.

Introduction

As the Web has grown, there has been an increasing need to address issues such as the scalability (the ability to handle growth in load efficiently), performance (as measured by the end users' perceived latency) and availability of Web sites. Intermediary solutions to these problems have been used for some time; first as caching proxies, and later in Content Delivery Networks, which use surrogates (intermediaries which act on behalf of the server, rather than the client) dispersed widely over the network to provide service.

HTTP intermediaries rely on certain aspects of requests and responses flowing through them to offer these benefits. For example, caching works because requests can be easily identified (through the Request-URI and a few headers), and the responses associated with them can be reused according to the HTTP's cache coherence rules.

While Web Services use lower-layer intermediaries (such as HTTP proxies and surrogates or SMTP relays) for transit, those devices' optimization mechanisms aren't aware of the semantics of a Web Services message, and therefore can't fully exploit them.

For example, while it's possible to describe cacheability for a Web Service's underlying HTTP response (e.g., with a Cache-Control or Expires header), this is only a basic means of providing hints about the message. In the best case, an HTTP cache would be indexed on the entire request as a string, meaning that a trivial change in the request's XML would cause it to be indexed as a separate entity.

This means that, by default, only the simplest Web Services can leverage intermediaries for optimization in very limited ways.

To allow intermediaries to more fully optimise Web Services, such optimisation need to be standardized, just as cacheability information for lower-layer HTTP entities is standardized in the HTTP. This will allow intermediaries to optimise a broad range of Web Services without service-specific code being executed.

This paper outlines potential use cases for Web Service intermediary optimization, proposes a framework for optimization, and identifies potential optimization techniques.

Optimisation Use Cases

Although tentative, these use cases help illustrate the potential of optimizing Web Service intermediaries.

StockQuote Service
By caching response elements containing rapidly-changing financial data for a period, a Stock Quote Service could offer enhanced end-user perceived performance whilst reducing load on centralized servers. Furthermore, slowly-changing data in the same response could be given separate, longer-term cacheability, with different criteria for invalidation from the cache.
News (RSS) Service
An XML-based news 'channel' Service can take advantage of a regular publication schedule to cache article summaries with an absolute time until validation (with the possibility of using partial content updates). Additionally, because some servers may provide many such services, channel requests and responses may be aggregated into a single message interchange for efficiency.
Distributed Authentication Service
A centralized Web site user authentication Service can exploit geographic locality in client behaviour by allowing a distributed group of caches to keep authentication state, rendered as XML, at the 'edge' of the network. If the user changes their password, the request for change can act as a trigger for invalidating the cached entry.
File Store Service
An 'Internet hard drive' Service, where users write to and read from a Service as if it were a network-available disk, could be distributed to a number of 'edge' servers to improve end-user perceived latency by exploiting locality in their access patterns. This could be achieved through a combination of store-and-forward into a cache (i.e., write caching), reading from the cache, and invalidation events to stimulate synchronisation with a centralized server.
Order Queue Service
With store-and-forward techniques, service intermediaries can provide higher availability for a service than a centralized server alone, whilst offering the potential to manage load on the central server by aggregating messages to it.
Voting, Poll and Auction Services
'Interactive' Services can take advantage of 'best-guess' information in cache whilst updating critical information through message triggers and element invalidations.

Optimisation Framework Requirements and Proposal

The following sections describe a proposal for a Web Service optimisation framework. It consists of optimisation techniques, which describe a way to exploit a particular behaviour to improve service. Techniques can be targetted at particular parts of messages through optimisation scoping, and often are invoked by arbitrary events, which we term optimisation triggers.

This proposal is based on the following requirements:

Optimisation techniques must identify and leverage patterns in common Web Service models.
To introduce efficiencies, optimisation must find appropriate behaviours to exploit; there must be balance between being too application-specific and too general; either extreme can make standardization useless.
They should be able to be applied to messages at a number of granularities.
Because Web Service payloads are based on XML, mechanisms have the opportunity to operate on individual XML elements, as well as the entire message. This offers far greater flexibility when describing service semantics to an intermediary.
The framework must be easy for service developers to understand and use.
Automatically generating hints for optimisation is difficult; it requires knowledge about the service semantics and underlying application. Because of this, it must be possible for developers to understand and easily use optimisation techniques.
Techniques must be able to be invoked explicitly or implicitly.
While some services will be able to communicate optimisation hints in-message, others will need to be capable of external hinting, to allow for different deployment scenarios. Similarly, some services will be able to invoke efficiency mechanisms by use of an XML Protocol Module, both others will require out-of-band invocation.
The framework may assume a trust model between the service intermediary and service provider, but should not require it.
Experience with HTTP caching shows that useful intermediary services often need a trust relationship with the content provider. However, there may be situations where this relationship is not essential.

We envision using the techniques, scoping and triggers together to form an XML Protocol Optimisation Language. Here, techniques, triggers and scoping mechanisms are presented as modular concepts. While we hope to keep the language as modular, and therefore expressive, as possible, this goal must be balanced against the requirement for ease of use. As a result, the actual language might not allow direct access to them, but instead introduce them in 'packages' of functionality.

Optimisation Scoping

XML offers an ideal way to control the scope of optimisation to portions of a message, because there are a number of ways to associate hints with a particular XML element or hierarchical group of elements (up to the scope of the entire message).

The most obvious means is through use of attributes in a separate XML Namespace in the document itself. For example, if an element 'foo' and its children are cacheable, it could be expressed as

<foo opt:invalidate="yes" opt:delta="5m"> ... </foo>

or

<opt:cache invalidate="yes" cache:delta="5m"><foo>...</foo></opt:cache>

However, this requires the intermediary to strip the elements targetted at it, and for those processors beforehand to ignore the opt namespace. As a result, it may often be preferable to describe optimisation outside of the document. This may be done in a WSDL file, in the XML Schema or TREX description of the message, or an XML Protocol header block in the message, using XPath.

The scoping mechanism used in a service depends on many aspects, including whether the technique's application is static (and can therefore be stated in advance) or dynamic (and therefore must be expressed in-message). Additionally, aspects of a service's deployment, including the nature of the optimising intermediaries, may influence the scoping mechanism used.

Optimisation Triggers

Many techniques require a description of when to apply them; for example, a cached object needs some event to invalidate it; similarly, a message store needs to be told when it is appropriate to forward a message. To accommodate this, a variety of trigger mechanisms could be defined;

Some applications may find it useful to combine triggers; for example, 'five minutes after a message containing the "action" element arrives'. As a result, the syntax should support arbitrary combinations of triggers as well as simple trigger events.

Optimisation Techniques

There is a rich history of optimisation techniques in protocol design and computer science in general that we can draw from. Here, we attempt to separate them into general mechanisms that may be combined to allow services to more powerfully and exactly control how service intermediaries handle their messages. This list draws primarily from techniques used in the HTTP, which in turn benefitted from experience in distributed filesystems [DFSScale].

Message Reuse (Caching)

By allowing clients to keep and reuse copies of entities, efficiencies are realised by either the avoidance of data transfer, or the avoidance of a round-trip to the server altogether. Caching techniques rely on locality in usage patterns; that is, the likelihood that portions of messages can be reused.

To be able to reuse an entity, a cache must understand the conditions under which it is appropriate to do so. Cache indexing defines the profile of request semantics in which a particular response may be reused. The most obvious way to index a cache is based upon Services' URIs, as HTTP does. This provides a namespace for cache lookups to be performed in.

For more complex applications, it may be necessary to modify the cache index depending on other attributes. For example, HTTP allows the 'Vary' response header to specify which additional request headers should be used to index the cache, allowing objects with separate language attributes to be stored under the same URI, for example. This content negotiation feature is crude in the HTTP, but could be much more expressive using XML.

Conversely, there may be situations where a Service URI-based cache index may be too restrictive; it may be useful to expand the scope of the cache index to include multiple resources, to allow entities to be reused across services. To accommodate these situations, it should be possible to declare a 'virtual' cache index that different resources can interact with.

Furthermore, a Service much have some control over the entities stored in a caching service intermediary. Cache coherence mechanisms provide this, typically through the use of validation (actively checking to see whether an entity should be reused) and invalidation (marking the content as 'stale' based on some trigger event).

Message Storage (Store-and-Forward)

Some Services consist of the submission of a message as the request, and a brief acknowledgement as a response, in a manner similar to SMTP's store-and-forward pattern. Standardization of an acknowledgement message would allow intermediaries to take responsibility for handling requests whilst immediately acknowledging them. In combination with caching and other techniques, store-and-forward allows intermediaries to improve service reliability substantially, by making it possible to have multiple, redundant points of contact for message submission, with the possibility for performance improvement through client/intermediary locality.

Partial Content

Often, it is only necessary to transmit part of a message. For example, a server may only need to update part of a cache's stored message [Delta], or it might be desireable to store the bulk of a message, while forwarding a smaller part immediately.

To accommodate this, partial content techniques allow specification of what parts of a message should be sent.

Aggregation

In some situations, intermediaries need to send or receive a number of separate messages to or from a particular device. Although some transport bindings may make it possible to reuse a network connection for these messages, further processing efficiencies might be realised by their combination into a single message. For example, it might be desirable to send all store-and-forward messages for a Service at once, wrapping all of them in a master message that uses an encryption module to protect them. If used across an HTTP binding, this approach avoids the overhead of separately encrypting the messages and then submitting each one and waiting for a response to indicate success.

Similarly, there may be situations where it is advantageous to 'piggyback' responses to give additional information to the intermediary. Previously, piggyback validation techniques have been examined in the HTTP [Piggyback], and such techniques could also be used with service intermediaries to pre-fill the cache, bundle invalidations, and perform other tasks.

Further Work

This paper has outlined areas of research regarding optimising mechanisms in service intermediaries; they are intended as a discussion point only. Hopefully, they will generate interest in standardization of such techniques, development of a framework for their use, and integration into Web Service toolkits and products.

References

SOAP - D. Box et. al. "Simple Object Access Protocol (SOAP) 1.1". May, 2000.

XMLP - W3C XML Protocol Working Group

DFSScale - M. Satyanarayanan. "The Influence of Scale on Distributed File System Design". In IEEE Transactions on Software Engineering, January 1992.

Piggyback - Balachander Krishnamurthy and Craig E. Wills. "Piggyback Server Invalidation for Proxy Cache Coherency". In Proceedings of the Seventh International World Wide Web Conference, Brisbane, Australia , April 1998.

Delta - J. Mogul et. al. "Delta encoding in HTTP", October, 2000.

Version: 1.2 - April 17, 2001