Fixed Invalid UTF8 sequence in truncated message(?)

Kent

Active member
Someone had the bright idea to make their "about" field a giant blob of stacking diacritics, which went over the hard-coded limit of 65535 characters.

Stacking diacritics look like this, and can be posted fine when under the character limit:
Code:
ก็็็็็็็็็็็็็็็็็็็็กิิิิิิิิิิิิิิิิิิิิก้้้้้้้้้้้้้้้

When submitting a message of only those characters repeated beyond the character limit, this error occurs:
Code:
Zend_Db_Statement_Mysqli_Exception: Mysqli statement execute error : Data too long for column 'about' at row 1 - library/Zend/Db/Statement/Mysqli.php:214

When submitting the same message prefixed by a single-byte character, this error occurs:
Code:
Zend_Db_Statement_Mysqli_Exception: Mysqli statement execute error : Incorrect string value: '\xE0\xB8\x81\xE0\xB9\x87...' for column 'about' at row 1 - library/Zend/Db/Statement/Mysqli.php:214

After poking around, it seems the TEXT max length is 65535 bytes, but XenForo is splitting the string by characters.
 
So part of this was a miscalculation on our part, though it is a bit of an consistency within MySQL. When you say VARCHAR(255) on a UTF-8 column, it actually means 255 characters. However, when you have a text column, the limit (65KB, 16MB, etc) is actually a byte limit and thus is affected by the variable length of UTF-8. Our count/length checks are done as characters across the board.

The safest thing here, with respect to about and signatures is to enforce a much lower limit. MySQL UTF-8 is normally only supports 3 byte characters, so that puts the worst case space limit as ~21000 characters but I've changed it to limit to 20000. Furthermore, I'm actually applying limits to the length of these fields to fit with the maximum message length as well.
 
Back
Top Bottom