[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: [syndication] The RSShovah Witnesses...
>> - knock on people's door, educating about the RSS format,
>> showing tools to produce RSS, and helping to add some
>> mention of the RSS feed into the site.
>
>> - have a group of people who just randomly sit together
>> and wait anxiously for requests for RSS feeds. these
>> people would scrape content and host the feeds themselves.
>
>We need a list of sites, and then some state indicators:
>
>(1) New (2) Need to be approached
>(3) Approached (4) Awaiting reply
>(5) Replied yes and feed exists (6) Replied yes and feed will soon exist
>(7) Replied no (8) Native feed in hand
>(9) Scraped feed in hand
>
>We could keep this list in XML form (under CVS control) on a server
>somewhere, and format it with XSL to produce a "status report".
Ok. How about this for an XML format (I hate attributes, so I've done none):
<site>
<requester_name>Joe Nascar</requester_name>
<requester_email>joe@home.com</requester_email
<requested_url>http://www.nascar.com/</requested_url>
<requested_date>some GMT ISO date</requested_date>
<status>one of the entries above</status>
<status_date>when the status was arrived at</status_date>
<implementor_name>Morbus Iff</implementor_name>
<implementor_email>morbus@disobey.com</implementor_email>
<notes></notes>
<last_updated>some GMT ISO date</last_updated>
</site>
At this point, we also need to answer some questions:
1. What is the process we take when someone requests a feed?
Possible answer: contact the provider with a prewritten
email, wait a week for a response, send another email
if no response, wait a week, if no response, scrape and
report.
2. Should we announce the scraped feeds to the site in question?
I worry that some sites will be "hey! stop that! take it
down!". Whilst one route, we respect their wishes at the
expense of wasted time and disgruntled requestors, the other
route we're being "sneaky".
3. If we do this off a CVS, the code for custom scrapes should
also be thrown on the CVS, along with any libraries and
required code.
4. Should fulfilled requests XML be kept in a legacy file for
safe keeping?
5. Will there be a definitive announce source for our scraped
feeds? I recommend Jeff's Manilla site.
At 9:50 AM +0200 7/18/01, Mike Krus wrote:
>But I have another offer: would both of you be willing to become
>NewsIsFree editors? If you know a little PHP and a lot of Regex
>you can help writting new parsers to build RSS news feeds...
Mike, I'm remotely familiar with PHP, but no master. If you gave me a few
samples of scraped feeds you've done, then I'd be able to learn pretty
quickly (same with Aaron Swartz and whatever language he's using. TCL, i
think?)...
>Combining that and the list of feeds you mentionned maybe
>we can build a comprehensive list of RSS feeds and expand
>it with scrappers...
Jeff and I have talked about creating a "comprehensive list", but the
discussion has kind of stalled. I've a very rough list (and documentation)
at:
http://www.disobey.com/amphetadesk/xml_services_lists.htm
http://www.disobey.com/amphetadesk/lists/services-channels-complete.xml
The format, however, needs revision. Primarily:
a) There shouldn't be three lists ("complete", "recent", and "failure").
They should be integrated together, and anything dead after a
certain period of time should be removed.
b) Some of the elements are just stupid and need to be removed.
c) It's impossible to gauge a channel's freshness based on HTTP
headers or internal data. I wanted to add some file size based
entries, such that we could compare filesize from check to check
as a secondary indicator of how often the channel has been updated.
--
Morbus Iff ( i am your scary godmother )
http://www.disobey.com/ && http://www.gamegrene.com/
please me: http://www.amazon.com/exec/obidos/wishlist/25USVJDH68554
icq: 2927491 / aim: akaMorbus / yahoo: morbus_iff / jabber.org: morbus