[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [syndication] Bad entities



In article <15295.31308.902619.22610@flutterby.com>, Dan Lyke
<danlyke@flutterby.com> writes
>Given that My Netscape has allegedly gone away (although they still
>remind me when my feed is non-conforming), 

The Netscape DTD is still up there at
http://my.netscape.com/publish/formats/rss-0.91.dtd
I just don't think we should rely on it.

>the right way to do this
>for the syndicators which are currently out there seems to be
>double-escaping, ie:
>
>  HTML: Gonz&aacute;les
>
>Would become:
>
>  XML RSS: Gonz&amp;aacute;les

Funnily enough the O'Reilly Meerkat feeds are currently doing this. And
I thought it was a bug! Maybe it's deliberate? They seem to be doing it
on all their own internally generated feeds and in all xml formats, RSS
1.0 as well.

>Until a better thought out standard appears, telling people to fix
>their feeds to the latter seems like the reasonable solution.

Which would then mean readers double-decoding the result as well. I was
going to do this as a kludge to be able to read the O'Reilly weblogs
rss. But I can't see any major harm in doing it to every feed. If there
aren't any encoded entities after the first round, then the second round
won't do anything.

The downside is that both writers and readers will need to change.
Writers so they produce valid xml, readers so that they can get valid
html out of the <description>. So I think a DTD statement is cleaner.

-- 
Julian Bond    email: julian_bond@voidstar.com
CV/Resume:         http://www.voidstar.com/cv/
WebLog:               http://www.voidstar.com/
HomeURL:      http://www.shockwav.demon.co.uk/ 
M: +44 (0)77 5907 2173  T: +44 (0)192 0412 433
ICQ:33679568 tag:So many words, so little time