[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [syndication] Re: syndication and i18n

To: syndication@yahoogroups.com
Subject: Re: [syndication] Re: syndication and i18n
From: hpyle@agora.co.uk
Date: Tue, 22 May 2001 17:01:31 +0100

> > - How does one deal with creating an HTML page from XML feeds which > > have potentially radically different charsets (i.e., ASCII and > > double-byte chinese on the same page)? >
> Pick a superset of all the encodings (UCS-2?).

UTF-8 will handle everything, mixed languages and such. That's one of the reasons all XML parsers have to handle UTF8 properly.

Since HTML doesn't have the XML's <?xml?> declaration, I think you probably have to say it's UTF8 in the headers. (is that right?)

My take: use a decent XML parser and you'll have all the parse-side encoding issues completely handled for you, and your Python code will just see Unicode. It might mean you end up with a stricter aggregator than some (eg. you won't be able to accept <item>stuff<img src="" because it's badly formed), but IMHO that's not a bad thing.

-Hugh

hpyle@agora.co.uk | +44 (0)20 8783 3592
http://www.agora.co.uk/ | http://groovelog.agora.co.uk/ | http://rendezvoo.net/

Follow-Ups:
- Re: [syndication] Re: syndication and i18n
  - From: Mark Nottingham <mnot@mnot.net>

Prev by Date: Re: syndication and i18n
Next by Date: Re: [syndication] Re: syndication and i18n
Previous by thread: Re: syndication and i18n
Next by thread: Re: [syndication] Re: syndication and i18n
Index(es):
- Date
- Thread