[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [syndication] html parsing as a horror story



> 1. Your stats page had nothing to do with it. Like many developers with a
> big product and a small team, we have a queue of bugs and features. That it
> took a few weeks to get it out says it was a *high* priority Bill.

And the larger consuming RSS community thanks you for it.

> 2. I did the work, not Jake.

Indeed, the comments in the script seem to reflect that.  So, now that you've
owned up to it, why does it mangle the UTF-8 characters?  Using a blanket
string.replace on the lone ampersand will end up producing double-encoded text.
It's incorrect to express &#999 as &#999 yet that's precisely what
xml.entityEncode does.  Please, really and truly, please fix this.

The xml.entityDecode routine would also benefit from some strengthening.  It's
current handling of things could be improved.  Especially in the area of
handling HTML entity (which it does not handle now).

There's a tremendous audience of non-English speaking users out there.  There
are many tools available to them that understand how to properly encode
characters and express language tags.  It would be great to see Radio follow
their lead.

-Bill Kearney