[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [syndication] One publishing problem keyword filtering might address



Bill Kearney <ml_yahoo@ideaspace.net> wrote:
Ultimately what I want is 15 or so headlines on my web page that
refresh every 30-60 minutes based on automated searches of 100 (or
maybe 50, or 25) news sources. Basically, it's what you get from
Moreover.com or Yellowbrix, but I don't want to pay $6000/year for a
custom feed for my nonprofit.

This cuts right to the issue.

Content doesn't just miraculously leap into digital form.  Someone's got to
collect and manage the delivery of it. Moreover and the like do this at pricing /dramatically/ lower than it's ever been before. At some point the pricing will
ratchet to another level.  We are not as that point yet.

Another take on this. I understood that Moreover filtered the incoming news into buckets automatically using a fairly simple algorithm. They then used a team of people to eyeball the results and make manual adjustments. Now look at Google news. This system is completely automatic and *almost* as good. Now look at Google news search. With a couple of simple search terms, you can get results that are *good enough* eg
http://news.google.com/news?q=hospital+medical+services&scoring=d

So my approach to this for a particular domain area is to assemble a collection of RSS news feeds in the domain. Add in a couple of synthetic feeds such as Google news search (or, ahem, Moreover search). And then group them by feed into a very few categories. Something like News, Commentary, Manufacturers. If some of the key sources for that domain don't have RSS, then nag at them to produce a feed and/or resort to a screen scraping service. After a day or two's work you can generate a composite feed that is surprisingly effective. Take a look at http://wifi.ecademy.com/module.php?mod=import as an example aimed at the Wifi, 802.11, wlan industry.

Now the backend of all this is a mysql database and I keep 4 weeks worth of rss items which are full text searchable. Nowhere in here is there any real searching or keywords. Effectively I'm using other people's expertise in maintaining their own sites and keeping them on topic to keep my collection of news on topic.

I re-publish the composite feed as another source of RSS. So with one of the desktop tools I could then view it locally as a ticker or whatever.

--
Julian Bond Email&MSM: julian.bond@voidstar.com
Webmaster:              http://www.ecademy.com/
Personal WebLog:       http://www.voidstar.com/
CV/Resume:          http://www.voidstar.com/cv/
M: +44 (0)77 5907 2173   T: +44 (0)192 0412 433