[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [syndication] Automatically Transforming Blog or HTML Content into XML



> > absolutely true.  :)  our paper is very clear about the methodology
> > we're using - the statistical numbers must be interpreted in line with
> > the methods being used.
>
> Do you have a publish date?

the thing is due on sunday... actually past due.  i'm not sure when the
proceedings volume it'll show up in will make it to press.  and i'm not
sure whether i can do anything with the content if it get printed in the
volume - i have no clue what the license that's going to be applied to it
by the press will be.  probably a Very Bad Restrictive One.


> > i'd hazard a guess that the # of blogger blogs is substantially more
> > than 57% of what's out there, but maybe not the 80% we're getting as a
> > result. again, it depends entirely on what you call a blog, and that's
> > a shifty question in itself.  the reified answers that most people are
> > willing to provide will not suffice for research purposes.  :)
>
> Again, "out there" is a vague concept.  That they 'exist' is one thing.
> That they provide content that has value, over time, to a measurable
> audience, would be a MUCH more useful number.  I'm not sure I'd argue
> that any one tool is dominant at this point in time.  Nor would I go
> without trending migrations from one tool to another.  I've seen quite a
> range of transition away from tools like Blogger and Radio (mainly to
> MoveableType).  But I'm sure the accuracy of my sampling as it exists
> thus far is crude, at best.

frankly i think the sampling we did is pretty crude too - i'd be much
happier with a sample size of around 50k.  but we did a whole hell of a
lot of this by hand, rather than trying to automate the un-automatable.
(screen scrapers that can deal with the content that's out there would be
EVIL...)


> So beyond what "you'd call a blog" I'd wonder if the REAL question isn't
> "does it provide anything worthwhile"?  This is like asking "what kind
> of grass grows best in a minefield?"

i think this is a Really Good Question - a lot of the content that's out
there is pretty fecal.  i'm not so sure, though, that throwing away
content that you don't think is 'worthwhile' is really a good way to
determine what the *form* should be.  there's an implied value judgment
there that makes me a little queasy.

> I just don't see that from the current range of content that there's
> enough data to distill anything even closely resembling "an answer".
> Sure, one could posit any number of clever positions but this is why I
> quoted Twain.

hehehehheheheh


> > > As we build out larger lists of feeds we stand the chance of
> > > building out larger lists that can be cross-referenced.  Likewise,
> > > as more content comes online the users will demand more effective
> > > ways to refine the list of what's presented to them.  But at this
> > > stage of the game there's frankly TOO LITTLE content online to start
> > > thinking about using exclusionary filters.
> >
> > i'm very interested in ways that this piece of the game might develop.
> >
> > it is pretty interesting that you say there's "too little content";
> > what're your criteria for "enough content" to start filtering in
> > earnest?
>
> Consider that we've a world population of over 5 billion.  Contrast that
> with the extant number of weblogs cited thus far (arguably hardly more
> than a million).  I'd say we have a way to go here before we start
> thinking there's 'enough' content online.

i guess we're at 1/5000th, then.  you're right, that's not a real huge
percentage when compared to world population.  i'm not sure that that's
what the basis of comparison should be, but it certainly does clamp down
on what the domain/range of interest might be - might be more interesting
to only think about it in comparison to the number of people who're LIKELY
to produce content on their own.  i think there's certainly a large group
of people who would never, ever, ever produce content for fun, even if the
tools are free, 100% available, etc etc etc.  :)


> My caution against filtering comes from observing the biased behaviors that
> usually accompany it.  The risk being that filtering is applied by "peoples in
> authority" for the "good of the people".  Consider bigotry, racism, religion,
> xenophobia, class warfare, nationalism and the like.

all great evils, yes.

> At this point the number
> of voices participating in the overall delivery of content does not strike me as
> being anywhere near large enough to justify investing efforts into excluding
> things.

oh, now i'm starting to follow.

> Basically, the existing number of content providers doesn't speak for a
> wide enough percentage of the public to start thinking that it's
> anything near approaching accurate enough to justify filtering.  But
> then again, those that would provide or control the filters would
> certainly argue against such an idea. Fortunately the public outnumbers
> them.

on the one hand there's the possibility of producing FILTERS that exclude
content.  that's a bad idea, as you say.  on the other hand, though,
there's room to create tools that allow users to collect and manage lists
of resources that they LIKE - inclusive content rather than exclusive.


i'm really enjoying this conversation - good to get to hash through some
of this .....


elijah