[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [syndication] Google News, Syndication and stuff
On Wed, Sep 25, 2002 at 09:51:33AM +0100, Julian Bond wrote:
> - I find the Google categories a bit broad. I could well imagine them
> using the same technology to produce 500 channels instead of 5 and
> moreover making other aggregators and clipping services redundant in the
> process.
I don't think so.
A lot of the value that news.google.com provides is scouring the net
for related stories on a small number of large categories. When
examining thousands of news sources, clusters begin to appear, and what
news.google.com is doing is showing the most current clusters in seven
areas.
Because it needs a large number of sources to identify the clusters of
news stories ("There's something going on in the Ivory Coast today",
"A vaccine for West Nile Virus is 3 years away..."), it doesn't make
sense to broaden that to 50 or 500 channels that can only cluster from a
3 or 5 sources each. I see no need to produce a channel for news from
Windsor, ONT, for example; the local news paper/tv station are already
doing the aggregation at the local level.
Another way to look at it would be to create 500 focused channels,
like "the NBA channel", "the World Cup channel" or "the Cancer
channel". This is similar to a product I worked on 10+ years
ago, and it's a much more difficult nut to crack. After all, it's
easier to identify that the Washington Bullets belongs in "Sports"
and a news story from the NIH belongs in "Health" than it is to
identify a random story about Michael Jordan opening a new restaurant
belongs in "NBA", or a photography benefit show supporting Brest Cancer
research doesn't belong in the Cancer Channel, even though it mentions
recent advancements in mammography.
If I were a betting man, I'd say that Google is looking into this
for a for-pay service they're planning. Creating these kinds of
automated and targeted news feeds is both very valuable and very
difficult to get right.
> - If they've currently got 4000 sources, is there any reason why it
> shouldn't work with 40,000? Is there an alternative to Blogdex in here?
> A Google Blogs to match Google News.
I'm not entirely sure this would work.
First, many blogging tools scroll entries together on the same page.
Some display comments on the same page. It's a much more difficult
problem to identify the blog entry in these circumstances, especially
compared to scraping out the news+photos from a news story found at the
BBC, New York Times or somesuch.
Also, blog entries are a lot more random. An entry about really
important features for tuning MySQL occasionally follows an entry about
a recent experience with LL Bean. At least on an online news site,
there are some clues as to what is "business" and "sports". If I were
to start a news.google.com clone, I'd certainly incorporate that
information into my heuristic models.
One final note: this kind of automated aggregation and categorization
has been a holy grail for about 20+ years. Projects to create "Your
Personal Newspaper" go back to the 1980s. news.google.com is just the
most recent (and possibly the best) approach to solving a portion of
this problem.
Z.