[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [syndication] General purpose scrapers and bandwidth



Julian Bond wrote:
> RSSify stats
> 130991 requests
> 2711949 Kb
> 77% of my bandwidth
> 10 requests per minute spread evenly through the 24 hours
>
> Good grief. Looks like I'll have to do some diggin' n' codin'.

The first thing I'd do would be to add two lines: at the very start,
ob_start(); and at the very end, require('cgi_buffer.php');  -  mnot's CGI
Buffer [1] buys you immediate E-Tag support, so even though you'll still
have to get the HTML and parse it, at least you can return a 304 instead of
the whole RSS file when nothing's changed, and if you have PHP
compiled --with-zlib you'll also get gzip encoding for those aggregators
that support it. Or, if your server has mod_gzip, add

mod_gzip_item_include mime xml$

to your .htaccess (or your Apache config, if you control it) to have Apache
gzip */xml. Assuming your users readers are using more modern software to
read than your users are using to produce, that ought to cut down the
outgoing bandwidth quite a bit. Saving E-Tags from the HTML you read would
save even more, but that's going to require a bit more effort than just
adding a couple-three lines.

Phil Ringnalda

[1] http://www.mnot.net/cgi_buffer/