[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [syndication] html parsing as a horror story
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Morten Frederiksen <mof-syndication@mfd-consult.dk> writes:
> Hi there,
>
> On Saturday 20 July 2002 00:00, Kevin wrote:
> <snip context="RSS 0.92 description encoding"/>
> > I won't bring up the security issues present with the possible syndication
> > of encoded <script> elements...
> I don't suppose this problem will go away with mod_content?
I does make it easier to deal with in some situations. if you are using a
Literal parseType then you can deal with the data directly as XML (instead of
CDATA).
Then you can do:
<xsl:template match="xhtml:script">
<!-- don't do anything for scripts -->
</xsl:template>
With SAX you can get rid of the data on your copy. You could also write an XSLT
extension to handle this or a search/replace for script sections but having the
data as XML makes this a lot more elegant.
> > On the peerfear link, notice the use of images for each <item>. This is
> > done with a mod_itemimage RSS 1.0 module I am about to propose.
> This does look interesting. Suggestion: It looks as if there's just the one
> image element defined in this module, how about adding at least a link element
> as well, this way it could be used for category links, /.-style. Or maybe
> this belongs in a categorization module? /thinking out loud/... maybe a full
> rss:image container could be used in some way, instead of several separate
> elements?
Right now I am thinking of building a more generic mod_image module for channel
images of multiple sizes, item images, etc. I hope to have something published
by next week or so..
The one on peerfear.org was just a quick hack so that it could work on my site.
That code will be upgraded to the proposal I make to the rss-dev team.
> > RSS 0.92 feeds (notice the lack of title with all structure encoded within a
> > <description> element as HTML)
> This is indeed ugly and close to unusable, but your point pushes me to point
> out an issue I have with your feed: The item description contains the entire
> item content, although HTML is stripped, but what is the point of this when
> you use mod_content? Isn't the description element - in any case - supposed to
> contain a *description* of the item, an abstract of sorts, not the item
> itself?
This is just a bug in records-mode:
http://www.peerfear.org/records-mode
records-mode does not yet support the building of descriptions out of the body
of a record so right now we are just using the whole content. This will be
fixed before we go 1.0 (and soon) and since I am the only one using the code
base right now it isn't too big of a deal.
I do agree this isn't very elegant but it will be fixed.
> I realize this is not against any formal rules or specs, but semantically I
> think it's wrong - and a waste of good bits, it currently can be derived from
> the mod_content content.
yup...
> BTW: Kevin, I noticed you complained that Gordon Mohr doesn't have a weblog.
> As far as I can tell, he has two [1] [2]!
<snip/>
Ah... yes. He pointed them out to me ;)
"Gordon pointed out that he does have a weblog (I was really just giving him a
hard time!)."
http://www.peerfear.org/rss/permalink/1027129043.shtml
Kevin
- --
Kevin A. Burton ( burton@apache.org, burton@openprivacy.org, burton@peerfear.org )
Location - San Francisco, CA, Cell - 415.595.9965
Jabber - burtonator@jabber.org, Web - http://www.peerfear.org/
GPG fingerprint: 4D20 40A0 C734 307E C7B4 DCAA 0303 3AC5 BD9D 7C4D
IRC - openprojects.net #infoanarchy | #p2p-hackers | #reptile
All the great empires of the future will be the empires of the mind.
-- Winston Churchill
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.7 (GNU/Linux)
Comment: Get my public key at: http://relativity.yi.org/pgpkey.txt
iD8DBQE9OeTYAwM6xb2dfE0RAgJRAJwKptp5Jhfr20cRuuKb7bMvtJN9kwCfQFzR
d3gW/JMXV/gKoD4IIQmVAWc=
=er1U
-----END PGP SIGNATURE-----