[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [syndication] Entities and Char sets



In article <9u0nr1.3vu9pc5.1@ID-99504.news.dfncis.de>, Karl Ove
Hufthammer <huftis@tiscali.no> writes
>Just curious, could you name a few toolkits that barf on
>whitespace between tags? (I'm not sure what you mean by 'name-
>value' missing.)

I'd rather not just now as I've only just notified the author. I'd
prefer to let them sort it out first. "name-value missing" was asking
the toolkit for the contents of <title> to be told that there was no
<title>. There was, it was just empty. eg <title></title>. Just a normal
software SNAFU ;-(

>> So what is the *correct* way of dealing with reserved and
>> high order characters in RSS content, such that the maximum
>> number of clients can read it and get the content to the
>> reader?
>
>Use the 'UTF-8' encoding. Write all ASCII characters directly,
>and write all other characters (i.e. all characters at
>codepoint > 127) as decimal character references. Then I
>believe you should be pretty safe. Simplified example:
>
><?xml version="1.0" encoding="UTF-8"?>
>
><title>Trademark symbol: &#8482;</title>

So as the author of a toolkit to generate this, I'd need to do a UTF-8
translation from whatever the content was stored in. And as the author
of the toolkit to read this, I'd have to pass it on to a browser as...
...what? 

When you say "all" ASCII, that'll be apart from &, ' < > of course.

-- 
Julian Bond    email: julian_bond@voidstar.com
CV/Resume:         http://www.voidstar.com/cv/
WebLog:               http://www.voidstar.com/
M: +44 (0)77 5907 2173  T: +44 (0)192 0412 433
ICQ:33679568 tag:So many words, so little time