Htmlspecialchars Causing Text To Disapear
Solution 1:
I understand now why it's returning a zero-length string. Sorry for asking this question. I should have researched more before posting. Anyway, the answer is the following:
On the PHP manual page for htmlspecialchars:
If the input string contains an invalid code unit sequence within the given encoding an empty string will be returned, unless either the ENT_IGNORE or ENT_SUBSTITUTE flags are set.
Then I ask myself what is "invalid" about this string? On the Wiki page for UTF-8 it gives a good diagram of UTF-8 encoding. All codepoints representing "plain text ASCII" would be 0-127 (the MSB in the byte is always 0).
If a byte's MSB is 1 (decimal 128 to 255) it tells a UTF-8 compliant parser that the codepoint consists of a multi-byte chain. And the next byte's first two Most-Significant-Bits must be a 1 followed by a 0.
Obviously in this string, there is a case where one byte is over 127 and the following byte does not begin with a 1 & 0. Therefore it is invalid UTF-8 encoding.
Thanks for this SO post for the resolution, which in my opinion, is to use the ENT_SUBSTITUTE flag (or I suppose ENT_IGNORE if you are sure that deleting these non-conforming bytes won't be a security issue).
Post a Comment for "Htmlspecialchars Causing Text To Disapear"