[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Finding Feeds
> (and enough metadata, like title, etc. can
> usually be extracted from the target if necessary).
Yes, let's eliminate the need to scrape at all. If you want the
data, just ask the source for it. Yes, this would require an
interactive source to query. For static blogs this would not work
without some help.
Theory has it that a site storing the data in XML and simply applying
brower compatible XSLT against it might allow this to happen rather
easily. However, the CPU load from all that dynamic processing
really doesn't seem worth it. Leaving it static and waiting for the
request seems like a lot less load on the system.
> However, I can see how this would be useful for a lot of other uses
> of RSS, such as those where you're actually shoving the content
> around (e.g., WebLogs, etc.). Of course, if the metadata were in the
> target, the link would be a good means of identification, but that
> assumes that the target and the metadata are authored by the same
> person; often not the case with WebLogs, etc.
I strongly disagree with the reliability of links. They're not
reliable, as many sites have shown. And not because the content is
no longer online. A site dying is a problem, of course, but
developmental changes to a site often break hierarchical web links
merely on the whims of developers. Let's not require them to keep
anything other than an item ID and single URI with a known set of
parameters. That way they can move it around to their hearts content.
This might also be a way to persist data even after it's gone from
the web. If the content provider dies but something else maintained
a database of items then you could redirect from that repository.
This could be something the client interface could be told to perform.
> (/me still wonders if this kind of confusion could be avoided if we
> used different terms for the different uses (linking and content) of
> RSS...)
Terminology confusion in RSS? Say it ain't so! <grin>
Try reading some of the other formats. I understand what they're up
to but it's a lot to wrap your head around. These so-called formats
all have sixteen different names for the same thing.
Anyway, it would be a good start to be able to find a single item
based on some uniquely identifying bit of info and extract it's
greater structure.
-Bill