[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [syndication] Re: PHP tool for parsing RSS1.0?
> Not at all. But its cool anyways. What are you using the metaphone,
> and soundex fields for?
Comparisions for duplicate data, etc. The Snewp will soon use them for
some new features too.
> This doesn't work for me because:
>
> * its seems to be discarding any fields associated with an item
> besides title, description, and link
The current version only looks for title, description, link, and
category - the most common tags. Adding new elements is easy - can add 25
more in a matter of seconds. I do plan to add the extra elements, but this
parser had specific intentions, so I didn't bother yet.
> * its a remote service, where I want something local (perhaps is the
> source as available?)
It is not a remote service. It is simply the only place I have that
particular code available to look at (output-wise anyway). I plan on
releasing a version of the code when it is more filled out - and as you
noted, it is missing some possible item elements, etc.
> * looks like its more focused on the .9x RSS formats.
Not sure what you mean by that. <pick-a-fight>In a parsing sense, there
is no difference except how much BS the code has to deal with (the BS gets
thicker as the versions progress).</pick-a-fight>
I use this parser for over 8500 different RSS (0.9x and 1.0) and true
RDF documents - several times a day. I have normalized the output for it's
intended use, but it doesn't change anything but identifiers. For example,
DocType for RSS 0.9x is the URL from the <!DOCTYPE .. > tag, if it exists,
but DocType for RSS 1.0 will return the URL of the primary Schema used.
Sometimes RSS 1.0 and RDF elements will be missed because the developer (or
author) of a feed isn't using standard namespace identifiers, but as I find
them, I can normalize them so they are parsed properly.
If you want to peek at the source - feel free -
http://syndicatethe.net/dev/reader/stn-reader.txt
This version excludes a rather complex schema recognition function that
is still having some issues. The function parses the schema includes so the
parser can recognize namespace specific elements on the fly -- like I said,
a bit buggy right now.
James