[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [syndication] The RSShovah Witnesses...

To: syndication@yahoogroups.com
Subject: RE: [syndication] The RSShovah Witnesses...
From: Morbus Iff <morbus@disobey.com>
Date: Mon, 23 Jul 2001 22:56:27 -0400
Cc: "Jeff Barr" <jeff@vertexdev.com>, mkrus@newsisfree.com
In-reply-to: <000401c10f48$8999bb20$f300630a@vertexdev.com>
References: <000401c10f48$8999bb20$f300630a@vertexdev.com>

>>  - knock on people's door, educating about the RSS format,
>>    showing tools to produce RSS, and helping to add some
>>    mention of the RSS feed into the site.
>
>>  - have a group of people who just randomly sit together
>>    and wait anxiously for requests for RSS feeds. these
>>    people would scrape content and host the feeds themselves.
>
>We need a list of sites, and then some state indicators:
>
>(1) New (2) Need to be approached
>(3) Approached (4) Awaiting reply
>(5) Replied yes and feed exists (6) Replied yes and feed will soon exist
>(7) Replied no (8) Native feed in hand
>(9) Scraped feed in hand
>
>We could keep this list in XML form (under CVS control) on a server
>somewhere, and format it with XSL to produce a "status report".

Ok. How about this for an XML format (I hate attributes, so I've done none):

  <site>
     <requester_name>Joe Nascar</requester_name>
     <requester_email>joe@home.com</requester_email
     <requested_url>http://www.nascar.com/</requested_url>
     <requested_date>some GMT ISO date</requested_date>
     <status>one of the entries above</status>
     <status_date>when the status was arrived at</status_date>
     <implementor_name>Morbus Iff</implementor_name>
     <implementor_email>morbus@disobey.com</implementor_email>
     <notes></notes>
     <last_updated>some GMT ISO date</last_updated>
  </site>

At this point, we also need to answer some questions:

 1. What is the process we take when someone requests a feed?
    Possible answer: contact the provider with a prewritten
    email, wait a week for a response, send another email
    if no response, wait a week, if no response, scrape and
    report.

 2. Should we announce the scraped feeds to the site in question?
    I worry that some sites will be "hey! stop that! take it
    down!". Whilst one route, we respect their wishes at the
    expense of wasted time and disgruntled requestors, the other
    route we're being "sneaky".

 3. If we do this off a CVS, the code for custom scrapes should
    also be thrown on the CVS, along with any libraries and
    required code.

 4. Should fulfilled requests XML be kept in a legacy file for
    safe keeping?

 5. Will there be a definitive announce source for our scraped
    feeds? I recommend Jeff's Manilla site.

At 9:50 AM +0200 7/18/01, Mike Krus wrote:
>But I have another offer: would both of you be willing to become
>NewsIsFree editors? If you know a little PHP and a lot of Regex
>you can help writting new parsers to build RSS news feeds...

Mike, I'm remotely familiar with PHP, but no master. If you gave me a few
samples of scraped feeds you've done, then I'd be able to learn pretty
quickly (same with Aaron Swartz and whatever language he's using. TCL, i
think?)...

>Combining that and the list of feeds you mentionned maybe
>we can build a comprehensive list of RSS feeds and expand
>it with scrappers...

Jeff and I have talked about creating a "comprehensive list", but the
discussion has kind of stalled. I've a very rough list (and documentation)
at:

   http://www.disobey.com/amphetadesk/xml_services_lists.htm
   http://www.disobey.com/amphetadesk/lists/services-channels-complete.xml

The format, however, needs revision. Primarily:

 a) There shouldn't be three lists ("complete", "recent", and "failure").
    They should be integrated together, and anything dead after a
    certain period of time should be removed.

 b) Some of the elements are just stupid and need to be removed.

 c) It's impossible to gauge a channel's freshness based on HTTP
    headers or internal data. I wanted to add some file size based
    entries, such that we could compare filesize from check to check
    as a secondary indicator of how often the channel has been updated.




-- 
Morbus Iff ( i am your scary godmother )
http://www.disobey.com/ && http://www.gamegrene.com/
please me: http://www.amazon.com/exec/obidos/wishlist/25USVJDH68554
icq: 2927491 / aim: akaMorbus / yahoo: morbus_iff / jabber.org: morbus

References:
- RE: [syndication] The RSShovah Witnesses...
  - From: "Jeff Barr" <jeff@vertexdev.com>

Prev by Date: New version of ext_aliases.xml file is available...
Next by Date: RE: [syndication] The RSShovah Witnesses...
Previous by thread: Re: The RSShovah Witnesses...
Next by thread: Re: [syndication] The RSShovah Witnesses...
Index(es):
- Date
- Thread