Skip to content Skip to sidebar Skip to footer

Htmlspecialchars Causing Text To Disapear

I encountered a particular string (it's not completely printable, but you can see it below) that causes a htmlspecialchars() to return a zero-length string. Is there any way this

Solution 1:

I understand now why it's returning a zero-length string. Sorry for asking this question. I should have researched more before posting. Anyway, the answer is the following:

On the PHP manual page for htmlspecialchars:

If the input string contains an invalid code unit sequence within the given encoding an empty string will be returned, unless either the ENT_IGNORE or ENT_SUBSTITUTE flags are set.

Then I ask myself what is "invalid" about this string? On the Wiki page for UTF-8 it gives a good diagram of UTF-8 encoding. All codepoints representing "plain text ASCII" would be 0-127 (the MSB in the byte is always 0).

If a byte's MSB is 1 (decimal 128 to 255) it tells a UTF-8 compliant parser that the codepoint consists of a multi-byte chain. And the next byte's first two Most-Significant-Bits must be a 1 followed by a 0.

Obviously in this string, there is a case where one byte is over 127 and the following byte does not begin with a 1 & 0. Therefore it is invalid UTF-8 encoding.

Thanks for this SO post for the resolution, which in my opinion, is to use the ENT_SUBSTITUTE flag (or I suppose ENT_IGNORE if you are sure that deleting these non-conforming bytes won't be a security issue).

Post a Comment for "Htmlspecialchars Causing Text To Disapear"