[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [syndication] Proper use of DOCTYPE?



> Strictly speaking, the entity has to be declared. So something of the form:

<!DOCTYPE rss [
<!ENTITY % HTMLlat1 PUBLIC
    "-//W3C//ENTITIES Latin 1 for XHTML//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent";  %HTMLlat1; ]>

> Is legal. Here we're not referencing a DTD (a specification of the
> document schema), but the declarations of the Latin 1 entities used
> by HTML.

Ah, so this would allow someone to use the entities from HTML without using a
specific DTD for the rest of the document, correct?

So a DTD delineating the RSS elements and attributes could then stand
independently of the HTML entities?  Granted, the existing DTDs for RSS have
these entities already in them.  But who's to say this will stay that way?

> Separate issues.
> You need to set the encoding attribute on the XML declaration to
> signal to the parser the character encoding of the XML file. In the
> majority of cases the default of UTF-8 is just fine -- because you
> can still include other characters using entities or character
> references.

Are there significant issues between UTF-8 and ISO-8859-1 that might crop up?

> For truly global syndication aggregators should expect to receive feeds in any
> number of markup encodings.

> However, as I said for the majority of occasions when authoring in English,
UTF-
> 8 encoding is fine. So I'd personally recommend referencing the HTML entity
set
> to allow for those cases when I need additional Latin characters, but leave
the
> XML encoding alone.
>
> It doesn't really hurt to include a DOCTYPE in the way I gave above (called
> the 'internal subset'). An aggregator worried about the performance of their
> parser downloading that entity file repeatedly can use a local catalog to
> maintain a local copy (but that's a different thread...)
>
> I'm not advocating their write their own entitiy definitions, just reuse those
> already defined for HTML.

Right, I agree with you here.  Use the ones known to exist before making up
something new.

>The big benefit there is that those authors who need
> extended character support should already be familiar with those entities (but
> perhaps not the DOCTYPE).

Sure, and I'm attempting to distill this into some sort of reasonably simple
explanation for people new to dealing with syndication, let alone XML.

Thanks Leigh, this helps.

-Bill Kearney