[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [syndication] Aggregating and displaying feeds
on 5/24/01 3:07 AM, Julian Bond at julian@netmarketseurope.com wrote:
> I started a discussion on another list but it seems appropriate here as
> well.
>
> Let's say you're writing YAFA (Yet Another Feed Aggregator!). I think
> there's some conceptual design thought about how you display the results
> from multiple feeds. I want to try and get away from displaying the
> results sorted by feed and sort them by relevance, age and subject
> instead.
<snip>
> If you allow Admins and Users to define their own bundles, you start to
> get into a many to many to many situation which is not hard in SQL but
> produces lots of linking tables and Joins.
>
> So far, so do-able. It's a SMOP! To put some meat on this, I could
> define a bundle on Motorcycle Racing which was a Moreover search + a
> feed from Motorcycle News, motorcycle.com, amasuperbike.com, MotoGP.com
> etc.
This is basically the way our aggregator/portal engine ("WireHose") works.
We identify resources (video, audio, individual syndicated headlines,
syndicated feeds, traffic cams, etc.) by a collection of tags; any resource
can have any number of tags so it can live in lots of different categories.
(My favorite example is Ken Griffey, Jr., getting traded: it's a Seattle
story, a baseball story, a Mariners story, and dagnabbit, it's a Cincinatti
story :-)
Site editors and users can then set up "fetchers", which specify the set of
tags associated with a particular subject, plus some other search criteria
like keywords, number of items to return, what to do if your topic matches
video or audio, sort ordering, etc.
You're right, it does hit the database server hard, but we get around that
with aggressive caching and other tricks. As you say, it's a SMOP (albeit
one that's taken a few years to solve properly :-)
> Where it gets interesting is when you start to think about categories
> and rating. And how the rating affects the visibility and priority of a
> particular <item>. It might be possible to get users to rate a
> particular feed and possibly even a source.
I've been thinking of doing something similar to Slashdot's jury duty, err,
moderation, especially at the feed level. But I'm not sure that approach is
right for all audiences; I think it works well at Slashdot because there's a
coherent culture there (and slashdotters appear to actually care about karma
:-).
> But individual items is too much work for them. And source is often not easily
> available. I think the ideal is to just use emergent behaviour to drive the
> system rather than relying too heavily on active input. If you can aggregate
> the click throughs and cross match it with the user, feed and source, there's
> some powerful rating to be done behind the scenes.
Absolutely. One feature I've been meaning to add to WireHose for the longest
time is a log spider that would roam through the clickthrough data and
resource tables and create new tags for the resources based on what people
liked and didn't like. Sort of like Amazon's "people who bought this also
bought" recommendations.
The problem for us with trying to rank individual items is that WireHose is
designed to deal with a LOT of items. For example our standard "small" data
set for testing is the complete Moreover.com feeds plus several thousand
items a day from the Associated Press newsfeeds. With that kind of resource
turnover, even with a large audience (say millions of users), many items
will never get ranked even via clickthrough data.
So the problem becomes one of identifying "more like this," where no human
has ever evaluated how much "like" any one thing is to another. We have a
fuzzy logic weighted rules-based thingamajig for extracting items from the
wirefeeds (editors always lie about metadata when putting stuff on the
wires!), and I've been exploring ways to apply this concept to trolling
through the resource databases and clickthrough data, but no real
breakthroughs yet.
> If either the feeds or the
> original web pages had a bit more metadata in them (like Dublin core),
> there's another set of data to be mined.
Yes, and I would love to see either automagic or voluntary categorization of
feeds into dmoz categories. There's >something< there in that idea but I
haven't quite put my finger on it yet. We designed WireHose to handle the
situation where items are already neatly categorized, and our fuzzy rule
thingie does OK for categorizing new stuff, but it's still too labor
intensive for one organization to do it by hand if you want human input
(unless you're Yahoo, I suppose).
> Where I'm going with this is of course, "The Daily Me". I'm curious to
> know just how close we can get to this, using the data that's available
> right now and without introducing new elements in the standards or a
> huge push to get people to produce richer feeds.
I really think it's doable now, with the existing simple syndication
standards we already have. Expecting thousands of sites to add on new
metadata (and to not lie about it, accidentally or deliberately) is, I
think, unrealistic. It may take the creation of a new standard for
communication of metadata between aggregators, but there are relatively
fewer players in that space than there are providers of content, so it might
be doable.
However, I'm not going to wait for any standard to emerge from any sort of
public discussion, it seems every time people talk about standards they end
up adding all sorts of "neat" features that are too hard to use in practice.
What I expect to see happen is some sort of "isThisFeedHotOrNot" site spring
up, and we'll all end up using whatever methods they use. Maybe Google will
do it, they seem to have the appropriate technology and resources.
Or maybe I'll make an extra big pot of coffee one weekend and wake up Monday
morning in a blur and find that it must have been built for me by the
gnomes.... :-)
--
Gary Teter, Big Dog
Bulldog Beach Interactive http://www.bulldogbeach.com
WireHose: The WebObjects multimedia portal framework http://www.wirehose.com