[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [syndication] Re: Being kind to clients



In article <a6ttm6+d0qf@eGroups.com>, muir_woods <JohnMunsch@zwave.com>
writes

>A better solution is simply to build the functionality into the 
>reader. The RSS library in HotSheet uses an itemHistory to keep an 
>MD5 hash of the title/link of each news item retrieved. When a new 
>retrieval is done, each item pulled has its hash generated and it is 
>checked against the history to see if it is new or just a repeat. 
>This basically never fails, I see items reappear very very rarely and 
>can often spot that the reason for a repeat when one does occur is 
>that the title of the item was changed after its initial appearance 
>in the RSS feed.

The reason I only keep 4 weeks of history in the database is performance
and disk space. I suppose I could keep a separate table of MD5 hashes
and just let it grow, while still expiring the table with the data in
it. What do you use as the source for the MD5?
Feedid+Link+Title+Description? 

-- 
Julian Bond    email: julian_bond@voidstar.com
CV/Resume:         http://www.voidstar.com/cv/
WebLog:               http://www.voidstar.com/
M: +44 (0)77 5907 2173  T: +44 (0)192 0412 433
ICQ:33679568 tag:So many words, so little time