[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [syndication] HTML Encoding BooHoo...
>>Now, the "reversion of entities" code in my RSS reader doesn't know about
>>HTML - it just blindly reverts < to < and so forth. Is the only solution
>>to my problem to make the code understand all the possible HTML entities?
>>Or is there something else?
>
>There's a fair bit of code around that removes all tags except a subset
>of "Allowable html". PHP even has this as a function built into the
>scripting language.
Yes, but that wouldn't solve my above problem (** and see earlier message).
In thiss case, <XML> wasn't a tag, it was part of the actual <title>.
Removing all HTML tags wouldn't affect the <XML>, cos that's not a valid
HTML tag anyways... Right now, my reader:
- loads in an XML file.
- converts any encoded </>'s to </> (to cover encoded HTML).
this is a mass replacement, which causes the above problem.
Ultimately, I don't want to remove tags (that's not a decision I'm willing
to make for the users of my program, but it will be an option that they can
choose from).
In this case, it's not even an issue of allowable tags or not - it's an
issue of preparing for people correctly encoding HTML (<b>) and not
encoding HTML (<b>).
** I eventually tracked the culprit to nothing in my code, but rather the
XML::Simple perl module, which seems to magick <XML> into <XML> all
by itself. I'm still investigating, but seeing the file encoded, and then
loading it through XML::Simple and Data::Dump[ing] it shows that it's
autoconverted. Why, I'm not sure...
Morbus Iff
.sig on other machine.
http://www.disobey.com/
http://www.gamegrene.com/