[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [syndication] Re: syndication and i18n
> > - How does one deal with creating an HTML page from XML feeds which
> > have potentially radically different charsets (i.e., ASCII and
> > double-byte chinese on the same page)?
>
> Pick a superset of all the encodings (UCS-2?).
UTF-8 will handle everything, mixed languages and such. That's one of the reasons all XML parsers have to handle UTF8 properly.
Since HTML doesn't have the XML's <?xml?> declaration, I think you probably have to say it's UTF8 in the headers. (is that right?)
My take: use a decent XML parser and you'll have all the parse-side encoding issues completely handled for you, and your Python code will just see Unicode. It might mean you end up with a stricter aggregator than some (eg. you won't be able to accept <item>stuff<img src="" because it's badly formed), but IMHO that's not a bad thing.
-Hugh
hpyle@agora.co.uk | +44 (0)20 8783 3592
http://www.agora.co.uk/ | http://groovelog.agora.co.uk/ | http://rendezvoo.net/