cgi_buffer libraries Version 0.3 (c) 2000 Mark Nottingham INTRODUCTION ------------ cgi_buffer is a group of libraries used to improve performance of CGI scripts (and other content generation engines) in some circumstances, by applying performance-enhancing HTTP mechanisms that are typically not supported by them. Currently, Perl, Python and PHP are supported. The Python library may also be used as a wrapper around another CGI script. If you use cgi_buffer in your application or server, please tell me at - any feedback is much appreciated. WHY BUFFER GENERATED CONTENT? ----------------------------- There are several mechanisms in the HTTP that can be used to improve performance. These include: * HTTP/1.0 Persistent Connections - dictating the object length, so that more than one request may be made on a connection. * Content Encoding - object body compression. * ETag Validation - validating a cached copy of an object to avoid unnecessary transfers. For more information on the benefits of persistent connections and compression, see: http://www.w3.org/Protocols/HTTP/Performance/ All of these mechanisms have the potential to improve end-to-end performance, but are seldom available with generated content, because buffering requests (in order to effect them) might increase 'first-byte latency', or the amount of time to deliver the beginning of the object. Buffering also increases processing overhead on a Web server. However, the benefits of buffering, and thereby effecting these mechanisms, can outweigh the cost of effectively. First-byte latency is not the only measure of Web performance, and is far from the most important in many cases. This is especially true when one realizes that the costs of buffering are applied only at the Web server, while the benefits are seen across the network. Since it is much easier to properly size a Web server than to control end-to-end performance on the Internet, it makes sense to do as much as possible on the Web server to improve end-to-end performance. DECIDING WHEN TO USE CGI_BUFFER ------------------------------- The library buffers the entire response before sending *any* data to the client. As a result, cgi_buffer should not be used for applications which need to stream data gradually to the client (such pages are very uncommon). If in doubt, try your page with the library to see how it works. It's worth noting that some Web servers may not make the connection persistent, even if cgi_buffer supplies a Content-Length (IIS 4.0 seems to have this problem). For more information, see http://www.mnot.net/papers/capabilities.html Finally, using cgi_buffer does add some types of overhead to processing on the Web server. Servers which handle high amounts of traffic should assure that they have spare capacity before putting cgi_buffer into production. Ideal conditions for cgi_buffer include pages that have a lot of included objects (which will benefit from persistent connections), large HTML and text pages (which will benefit from compression), and objects which don't change every time they are generated (which will benefit from ETag validation, although this isn't yet supported by many clients). In particular, cgi_buffer tends to help with SSL sites, bad connectivity (for instance, if you have a lot of international users) and highly interactive sites. If in doubt, the best thing is to test the whole page (not just one object) as your typical client would use it. For instance, if most of your clients will be accessing from far away on the network, or with low-speed modems, it's difficult to gauge the relative benefits by testing from your local LAN. See http://www.mnot.net/cgi_buffer/ for demonstrations of the effects of cgi_buffer. INSTALLATION AND USE -------------------- For instructions, see the README file in the directory appropriate to the content generation engine you use. Note that the Python library may also be used as a wrapper around CGI scripts, if you use a language (Tcl, C/C++, etc) that is not currently supported. IN THE FUTURE ------------- I have a number of plans for cgi_buffer. Eventually, I'd like it to be a one-stop, easy-to-use library for HTTP performance, including more automated cacheability information. More advanced functionalities like automatic range handling may also be considered. More generation engines will be added as time and experience with them permits. This includes server scripting modules like mod_perl and mod_python. Shorter-term TODOs include: * optionally generate Last-Modified headers * only compress on text/* in PHP * better error handling * wrapper interface for perl QUESTIONS? ---------- If you have any problems, questions or comments, please check the Web page first, and then try mailing me: http://www.mnot.net/cgi_buffer/ mailto:mnot@pobox.com