[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: [syndication] Bad entities
I'm getting complaints that the syndic8 feed at
http://www.syndic8.com/genfeed.php?format=ocs does not properly declare
or handle entities.
Here is the first offending line:
<dc:description>Linux Preview - Noticias de Linux en
Español</dc:description>
Do I need to add a DOCTYPE to my feed? What should it look like?
Thanks,
Jeff;
-----Original Message-----
From: Julian Bond [mailto:julian_bond@voidstar.com]
Sent: Saturday, October 06, 2001 1:21 PM
To: syndication@yahoogroups.com
Subject: Re: [syndication] Bad entities
In article <Pine.SOL.4.21.0110061409580.15777-100000@ic-
unix.ic.utoronto.ca>, Ian Graham <ian.graham@utoronto.ca> writes
>1) include the needed single-character entity definitions from the
>xhtml-lat1.ent file _directly_ inside the DTD at the start of an XML
(RSS)
>messge, as in:
>2) include an external entity declaration in the DTD (one that
references
>the complete xhtml-lat1.ent resource) and then include that entire
entity
>into the DTD, as in:
Ok. I've done a bit more digging and this is what I think is happening.
1) The RSS 1.0 spec[1] gives an example:-
<?xml version="1.0"?>
<!DOCTYPE rdf:RDF [
<!ENTITY % HTMLlat1 PUBLIC
"-//W3C//ENTITIES Latin 1 for XHTML//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent">
%HTMLlat1;
]>
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns="http://purl.org/rss/1.0/"
>
etc ...
Any RSS(DF) 1.0 feed should include this if they might have html
entities in the <item><description> element. I don't think many of the
them do.
2) When the Netscape RSS 0.91 DTD disappeared off the net temporarily, a
lot of us just removed the <!DOCTYPE entry. But this DTD contains the
HTML entity references. So removing it is fine as long as the reader
doesn't validate the XML and/or we don't allow HTML entities in our
feeds. But of course we do, and it's no longer valid XML. The short term
solution is to put the entry back in. The Netscape spec for 0.91[2]
suggests using this.
<?xml version="1.0"?>
<!DOCTYPE rss SYSTEM "http://my.netscape.com/publish/formats/rss-
0.91.dtd">
<rss version="0.91">
<channel>
etc...
3) Manila RSS (at /xml/rss.xml) seems to use this.
<?xml version="1.0"?>
<!DOCTYPE rss PUBLIC "-//Netscape Communications//DTD RSS 0.91//EN"
"http://my.netscape.com/publish/formats/rss-0.91.dtd">
<rss version="0.91">
<channel>
etc ...
This appears to have the same effect. However, Radio's 0.92 [3] (pretty
much the only source of 0.92 apart from Drupal[6] and RSSify[7]) doesn't
have a <!DOCTYPE entry at all. Neither of the Userland specs for 0.91[4]
or 0.92[5] or their example files, mention it. From the 0.92 spec
"Further, 0.92 allows entity-encoded HTML in the <description> of an
item, to reflect actual practice ... " This is dangerous if HTML entity
encodings are included, as we've discovered.
4) So for RSS 0.9x I'm uncomfortable with depending on the Netscape DTD.
The best solution I can see is to instead depend on the w3.org as the
entity definitions are less likely to disappear. So (assuming I've got
it right) we need to add these lines to the top of the files.
<!DOCTYPE rss [<!ENTITY % HTMLlat1 PUBLIC
"-//W3C//ENTITIES Latin 1 for XHTML//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent">
%HTMLlat1;]>
Phew!
[1] http://groups.yahoo.com/group/rss-dev/files/specification.html
[2] http://my.netscape.com/publish/formats/rss-spec-0.91.html
[3] eg http://wolk.datashed.net/users/adam@curry.com/curryCom.xml or
http://www.ourfavoritesongs.com/users/dave@userland.com/rss/xml.xml
[4] http://backend.userland.com/rss091
[5] http://backend.userland.com/rss092
[6] http://www.drupal.org
[7] http://www.voidstar.com/rssify.php
--
Julian Bond email: julian_bond@voidstar.com
CV/Resume: http://www.voidstar.com/cv/
WebLog: http://www.voidstar.com/
HomeURL: http://www.shockwav.demon.co.uk/
M: +44 (0)77 5907 2173 T: +44 (0)192 0412 433
ICQ:33679568 tag:So many words, so little time
Your use of Yahoo! Groups is subject to
http://docs.yahoo.com/info/terms/