[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [syndication] Aggregating and displaying feeds



Gary says,

> I've been thinking of doing something similar to Slashdot's jury duty,
err,
> moderation, especially at the feed level. But I'm not sure that approach
is
> right for all audiences; I think it works well at Slashdot because there's
a
> coherent culture there (and slashdotters appear to actually care about
karma
> :-).

Many bulk news consumers (the "news junkies") use news to get
an edge over their competition -- they can be better informed, and thus
make a better banking or investment decision if they have more news,
faster. These people are not very likely to want to democratize the
process by drawing more attention to their sources. They will all want
to benefits from the "clicks" of others and yet would prefer to opt out.
One solution is to allow only those who vote to see the results.

That said, it would be very interesting if several of us were to aggregate
"click through" data to come up with a list of top stories. If there
was a server somewhere that would take raw clicks and aggregate them, I
would definitely be interested in participating.

> Yes, and I would love to see either automagic or voluntary categorization
of
> feeds into dmoz categories. There's >something< there in that idea but I
> haven't quite put my finger on it yet.

My feeling, based on the fact that I have been shipping/building a product
for the past two years, is that users are not going to be very helpful with
this. For the most part they just want news. They do not want to change
settings,
adjust preferences, and so forth. I don't see people clipping newspaper
stories from the Sports section and pasting them into the Business section
just because some guy got a $100M contract. They take what they get.

Headline Viewer (www.headlineviewer.com) has a very simple category
hierarchy.
I built it by looking at all of the providers and trying to come up with a
nice structure that would represent them all. I did not use any formal tools
and I did not hire any Ph.D ontologists (I know a company that did this and
their hierarchy was not nearly as good as mine). Here is the current
Headline
Viewer hierarchy. Note that this is a category of news providers, not of
articles.

Dow Jones 30

Celebrities
  Female
  Male

Consumer
  Food
  Health
  Hobbies
  Home
  Gardening
  Travel

Culture
  Film
  Humor
  Music
  Politics
  Religion

Finance
  Venture Capital

General

Industry

Internet
  Regional

Regional
  Europe
  Asia
  North America
    United States
  South America

Science

Sports
  Professional
    Baseball
    Football
    Hockey
    Basketball
  Collegiate

Technology
  Software
    BEOS
    Games
    Linux
    Macintosh
    Security
    Windows
    XML
    Programming
  Hardware

Web Log

Jeff;

-----Original Message-----
From: Gary Teter [mailto:bigdog@bulldogbeach.com]
Sent: Thursday, May 24, 2001 10:21 AM
To: syndication@yahoogroups.com; Julian Bond
Subject: Re: [syndication] Aggregating and displaying feeds


on 5/24/01 3:07 AM, Julian Bond at julian@netmarketseurope.com wrote:

> I started a discussion on another list but it seems appropriate here as
> well.
>
> Let's say you're writing YAFA (Yet Another Feed Aggregator!). I think
> there's some conceptual design thought about how you display the results
> from multiple feeds. I want to try and get away from displaying the
> results sorted by feed and sort them by relevance, age and subject
> instead.
<snip>
> If you allow Admins and Users to define their own bundles, you start to
> get into a many to many to many situation which is not hard in SQL but
> produces lots of linking tables and Joins.
>
> So far, so do-able. It's a SMOP! To put some meat on this, I could
> define a bundle on Motorcycle Racing which was a Moreover search + a
> feed from Motorcycle News, motorcycle.com, amasuperbike.com, MotoGP.com
> etc.

This is basically the way our aggregator/portal engine ("WireHose") works.
We identify resources (video, audio, individual syndicated headlines,
syndicated feeds, traffic cams, etc.) by a collection of tags; any resource
can have any number of tags so it can live in lots of different categories.
(My favorite example is Ken Griffey, Jr., getting traded: it's a Seattle
story, a baseball story, a Mariners story, and dagnabbit, it's a Cincinatti
story :-)

Site editors and users can then set up "fetchers", which specify the set of
tags associated with a particular subject, plus some other search criteria
like keywords, number of items to return, what to do if your topic matches
video or audio, sort ordering, etc.

You're right, it does hit the database server hard, but we get around that
with aggressive caching and other tricks. As you say, it's a SMOP (albeit
one that's taken a few years to solve properly :-)

> Where it gets interesting is when you start to think about categories
> and rating. And how the rating affects the visibility and priority of a
> particular <item>. It might be possible to get users to rate a
> particular feed and possibly even a source.

I've been thinking of doing something similar to Slashdot's jury duty, err,
moderation, especially at the feed level. But I'm not sure that approach is
right for all audiences; I think it works well at Slashdot because there's a
coherent culture there (and slashdotters appear to actually care about karma
:-).

> But individual items is too much work for them. And source is often not
easily
> available. I think the ideal is to just use emergent behaviour to drive
the
> system rather than relying too heavily on active input. If you can
aggregate
> the click throughs and cross match it with the user, feed and source,
there's
> some powerful rating to be done behind the scenes.

Absolutely. One feature I've been meaning to add to WireHose for the longest
time is a log spider that would roam through the clickthrough data and
resource tables and create new tags for the resources based on what people
liked and didn't like. Sort of like Amazon's "people who bought this also
bought" recommendations.

The problem for us with trying to rank individual items is that WireHose is
designed to deal with a LOT of items. For example our standard "small" data
set for testing is the complete Moreover.com feeds plus several thousand
items a day from the Associated Press newsfeeds. With that kind of resource
turnover, even with a large audience (say millions of users), many items
will never get ranked even via clickthrough data.

So the problem becomes one of identifying "more like this," where no human
has ever evaluated how much "like" any one thing is to another. We have a
fuzzy logic weighted rules-based thingamajig for extracting items from the
wirefeeds (editors always lie about metadata when putting stuff on the
wires!), and I've been exploring ways to apply this concept to trolling
through the resource databases and clickthrough data, but no real
breakthroughs yet.

> If either the feeds or the
> original web pages had a bit more metadata in them (like Dublin core),
> there's another set of data to be mined.

Yes, and I would love to see either automagic or voluntary categorization of
feeds into dmoz categories. There's >something< there in that idea but I
haven't quite put my finger on it yet. We designed WireHose to handle the
situation where items are already neatly categorized, and our fuzzy rule
thingie does OK for categorizing new stuff, but it's still too labor
intensive for one organization to do it by hand if you want human input
(unless you're Yahoo, I suppose).

> Where I'm going with this is of course, "The Daily Me". I'm curious to
> know just how close we can get to this, using the data that's available
> right now and without introducing new elements in the standards or a
> huge push to get people to produce richer feeds.

I really think it's doable now, with the existing simple syndication
standards we already have. Expecting thousands of sites to add on new
metadata (and to not lie about it, accidentally or deliberately) is, I
think, unrealistic. It may take the creation of a new standard for
communication of metadata between aggregators, but there are relatively
fewer players in that space than there are providers of content, so it might
be doable.

However, I'm not going to wait for any standard to emerge from any sort of
public discussion, it seems every time people talk about standards they end
up adding all sorts of "neat" features that are too hard to use in practice.
What I expect to see happen is some sort of "isThisFeedHotOrNot" site spring
up, and we'll all end up using whatever methods they use. Maybe Google will
do it, they seem to have the appropriate technology and resources.

Or maybe I'll make an extra big pot of coffee one weekend and wake up Monday
morning in a blur and find that it must have been built for me by the
gnomes.... :-)

--
Gary Teter, Big Dog
Bulldog Beach Interactive http://www.bulldogbeach.com
WireHose: The WebObjects multimedia portal framework http://www.wirehose.com




Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/