[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

syndication and i18n

To: syndication@yahoogroups.com
Subject: syndication and i18n
From: Mark Nottingham <mnot@mnot.net>
Date: Mon, 21 May 2001 23:16:32 -0700
User-agent: Mutt/1.2.5i

So, I've been spending spare moments here and there putting together
an aggregator for some time, in Python. I've never written an
internationalized app before, and, wanting to do The Right Thing, I
thought I'd give it a try, especially seeing as how Python 2.x
supports Unicode.

I may have bitten off more than I can chew.

It seems that the permutations of:
 - source XML charset declaration,
 - actual character content of the XML, and
 - browser's desired charset
are overwhelming. 

Many feeds occasionally have characters that pop through unescaped,
such as single-quotes from Windows, etc.

Currently, my strategy is to .encode('utf-8') EVERYTHING that comes
in, and write that out (if you mix encodings in certain ways, Python
doesn't like it). This works, but it doesn't seem too friendly to
double-byte feeds or users, who I assume would be out of luck.

Questions;
 - should I emit 'utf-8' in the appropriate HTTP headers to make
   browsers do the right thing?

 - In python, are there ways to:
   - determine what encoding an XML document uses (from SAX)
   - determine what encoding an arbitrary string is in

 - Does the above strategy doom double-byte users?

 - How does one deal with creating an HTML page from XML feeds which 
   have potentially radically different charsets (i.e., ASCII and
   double-byte chinese on the same page)?

 - Does anybody know of some Cantonese RSS feeds for testing? ;)

 - How does one catch and deal with illegal characters in the XML
   source (SAX2)?

Regards,

-- 
Mark Nottingham
http://www.mnot.net/

Follow-Ups:
- RE: [syndication] syndication and i18n
  - From: "James Carlyle" <james@calaba.com>
- Re: syndication and i18n
  - From: Aaron Swartz <aswartz@swartzfam.com>

Prev by Date: RE: [syndication] Digest Number 273
Next by Date: RE: [syndication] syndication and i18n
Previous by thread: RE: [syndication] Digest Number 273
Next by thread: RE: [syndication] syndication and i18n
Index(es):
- Date
- Thread