[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

One publishing problem keyword filtering might address



Re: yesterday's discussion about keywords, here's the actual 
publishing problem that prompted my message to Carl about the 
keywords issue and his subsequent post to the list... 

Here's my particular scenario: Let's say I want to filter 100 feeds 
using keywords. In my case, "keywords" isn't a meta thing -- I 
actually mean "search terms." The 100 feeds, in my scenario, are 
mostly online versions of print newspapers around the country. (We 
can ignore for now whether 100 such outlets actually provide RSS 
feeds). In my case, I'm interested in environmental news and at any 
given time I have enough of a sense of what MIGHT be written about to 
set useful search terms like "superfund," "new source review," "clean 
water act," "mountaintop mining," "mountaintop removal 
mining," "asthma AND pollution," "acid rain," etc. Many such terms 
would be required, but there's no reason why I couldn't cast a wide 
enough net with, say, 150 search phrases. Of course, my net will 
still miss some articles, and my net will occasionally grab stuff 
that isn't relevant, but it's still a very useful net. 

Ultimately what I want is 15 or so headlines on my web page that 
refresh every 30-60 minutes based on automated searches of 100 (or 
maybe 50, or 25) news sources. Basically, it's what you get from 
Moreover.com or Yellowbrix, but I don't want to pay $6000/year for a 
custom feed for my nonprofit. 

I could also do this with Nexis -- almost. I could have Nexus email 
me search results, but then I'd have to go and get URLs for those 
articles every hour and update my site accordingly. An impossible 
thing to do manually unless you do absolutely nothing else.

I'm not sure, from an electronic standpoint, what actually would be 
happening to "filter" RSS feeds with my search terms. Would a spider 
be visiting the sites supplying the feeds (which might be bad 
etiquite -- that's a lot of spidering) or would those feeds be 
sending stuff to something on my end, which could then be filtered 
according to my search terms, thus avoiding any ethical dilemnas 
about bandwidth? As a non-programmer, I don't know what actually 
would be happening.

Another way to pose my question: with so many people developing 
software that almost does what Moreover and YellowBrix do for 
$6000/year, why hasn't anyone gone the extra yard to actually provide 
something that can be downloaded for free or for $100 or so, that 
does the same thing? I'm not the person to do it -- I'm too busy 
being a website editor. And as a non-programmer, I don't know how 
difficult it is to go that extra yard.

Such a program would not solve every problem faced by every person 
who has every asked about filtering RSS feeds using keywords -- but I 
think automated searches of the type I'm describing can work, 
provided you're familiar enough with the behavior of the publications 
you're filtering and you're smart about use of search terms.

P.S. Since I wrote this, Carl tells me Nexis might have the 
functionality I seek, maybe, but I'd have to make like a programmer 
and find out how xml works... sheesh...

Ryan Walker
Website Editor
Environmental Media Services
www.ems.org
ryan@ems.org