Follow 2.04 (c) Copyright 1998 Mark Nottingham Follow processess Combined Logfile Format Web logs to reveal usage patterns of the site's pages. For more information, see http://www.pobox.com/~mnot/follow2/ ** As of Follow 2.04, this product is free, oepn and unsupported. ** That's right, it's abandonware; I don't have time to develop ** or think about Follow, but some people have expressed interest ** in how Follow works. Here it is. ** ** I wrote almost all of this code in 1998, and am no longer familiar ** with it. Indeed, looking at it, I cringe at parts. Hopefully, ** it will be useful to someone. ----------------------------------------------------------------------- Introduction Instead of giving hit counts (like most Web log analysers), Follow tracks 'User Sessions'. a User Session is one complete path that a user takes through your site; follow will reconstruct it from the logfiles. Then, you have your choice of how to view the data. It can be viewed directly, on a page-by-page basis to reveal how a particular page is used, or in a summary to highlight outstanding pages. By using the Page-by-Page and Summary views, you can determine how long users spend on a page, how they reach it, where they go from it and other statistics. This information can help you design a more usable Web site. Follow works by analysing the Web log file and summarising what it finds in a cache directory; this is the job of the 'follow-gather' program, and ideally should be done by a cron job (that is, automatically performed at a pre-set interval). Then, when you wish to view the results, you can browse the contents of the databases in the cache directory with a CGI program 'follow.cgi', from a Web browser. Because the same cache directory must be accessed by both the summarisation program and the browsing program, it is very helpful if these tasks are done by the same user, to avoid permission conflicts. ----------------------------------------------------------------------- Requirements - A Web site that does not use Frames; this limitation may be overcome in a future release. - Access to Web log files for a single site in Combined format (see the FAQ for information on the combined format). Other formats can be supported; write for details. - The ability to periodically run programs on the same host as the logs ('cron'). Alternatively, the logs can be copied to a local machine and manually run. - The ability to run CGI programs on the web server, to display follow.cgi's output. This may also be done on a local Web server. - Python 1.5.2 or greater installed. ----------------------------------------------------------------------- Configuration 1.) Edit the follow.conf file to suit your site; instructions can be found in this file. 2.) Place the 'follow.cgi' program in your Web server's CGI directory, along with a copy of the follow.conf file. There MUST be a copy of follow.conf in the CGI directory. Make sure that both files are readable by the user that runs the Web server, and follow.cgi is executable by that user. 3.) Place a copy of the same follow.conf file in the same directory as the follow-gather program. 4.) Make sure that all of the libraries are available to both programs, either by installing them in both directories, or in your Python's site-packages directory. 5.) Make a directory in the place which you specified the cachedir in follow.conf; this directory MUST be readable and writable to the user who will run follow-gather, as well as readable by the user that runs the Web server. ----------------------------------------------------------------------- Use For help running follow-gather, try follow-gather -h The first time you wish to use Follow, you'll probably have an old logfile, or a number of them, to analyse. To do this manually, pass the '-i' flag (to grab the log from STDIN) and the '-v' flag (verbose output) to follow-gather, like this: cat /httpd/logs/access_log | follow-gather -i -v or, to decompress and feed follow a number of logs: zcat /httpd/logs/access_log.*.gz | follow-gather -i -v After this, it's best to automate the analysis process. Ideally, this should be done once a day, although it can be done less often on a low-traffic site. To automate this, you should use cron. See the FAQ for details. Finally, to view the stats, point your browser at the file 'follow.cgi' in your Web site's CGI directory. Help is available from within the CGI. ------------------------------------------------------------------------ Hints For best performance, it is recommended that you rotate your logfiles often; extremely large logfiles may take longer to process. For best results, make sure that follow-gather runs right before the logs are rotated.