[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [syndication] Proper use of DOCTYPE?
Hi,
On Friday 22 March 2002 20:35, you wrote:
> Changing their header to read:
> <?xml version="1.0" encoding="ISO-8859-1"?>
> <!DOCTYPE rss SYSTEM "http://my.netscape.com/publish/formats/rss-0.91.dtd">
> This assumes, of course, that they're going to be using that particular
> form of encoding. The other choices for korean, russian, japanese, etc.
> Or leaving it out entirely to indicate use of UTF-8 encoding. And to avoid
> using entities at ALL by using the UTF-8 numeric encodings.
Actually, when they reference and use the entities, it doesn't matter which
encoding they use, as there are no characters to encode (only non-ASCII is
encoded). The entity definitions in the DTD just reference the numeric
entities, which are valid in all encodings.
Also, they don't even need the entities, as UTF-8 always can be used, and
most western characters exist in iso-8859-1, and can thus be used directly
with that encoding, which must then of course be specified.
There's a whole lot of combinations that will work...
> As you point out, they could put their own entity definitions in the
> header. I doubt many would really want to be doing this.
It depends on the RSS format in question, and I believe it's something like
the following:
RSS 0.9 is really ugly, as it states [1] that the character set must be
iso-8859-1, utf-8 is not allowed, but the declaration omits an encoding
specification, and in XML terms this is then defaulted to utf-8!
Also, more importantly in this regard, it states that decimal and HTML
entities are allowed, but entities must be declared before use, and there's
no reference to a DTD or other entity declaration. In my opinion it's
impossible to create a well-formed international RSS 0.9 feed without
violating something.
RSS 0.91 states [2] that there must be a reference to the DTD (this is
missing from this feed, but is somehow understandable, since the DTD was
removed by Netscape at one point, causing a lot of grief). The DTD contains
the mostcommon entity declarations, i.e. é, so if the DTD reference is
present, use of the entities defined there is legal.
RSS 0.92 [3] doesn't say anything on the subject, except that it's
'upward-compatible' with 0.91.
RSS 1.0 [4] specifically states that:
"Since RSS 1.0 does not require a DTD, be sure to include inline declarations
of entities used aside from the aforementioned five."
(The five mentioned entities are the always-valid <, >, &, '
and ".) It also includes a widely used example.
Morten Frederiksen
[1] http://www.purplepages.ie/RSS/netscape/rss0.90.html
[2] http://my.netscape.com/publish/formats/rss-spec-0.91.html
[3] http://backend.userland.com/rss092
[4] http://purl.org/rss/