• This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn more.

Fixed Invalid UTF8 sequence in truncated message(?)

Kent

Active member
#1
Someone had the bright idea to make their "about" field a giant blob of stacking diacritics, which went over the hard-coded limit of 65535 characters.

Stacking diacritics look like this, and can be posted fine when under the character limit:
Code:
ก็็็็็็็็็็็็็็็็็็็็กิิิิิิิิิิิิิิิิิิิิก้้้้้้้้้้้้้้้
When submitting a message of only those characters repeated beyond the character limit, this error occurs:
Code:
Zend_Db_Statement_Mysqli_Exception: Mysqli statement execute error : Data too long for column 'about' at row 1 - library/Zend/Db/Statement/Mysqli.php:214
When submitting the same message prefixed by a single-byte character, this error occurs:
Code:
Zend_Db_Statement_Mysqli_Exception: Mysqli statement execute error : Incorrect string value: '\xE0\xB8\x81\xE0\xB9\x87...' for column 'about' at row 1 - library/Zend/Db/Statement/Mysqli.php:214
After poking around, it seems the TEXT max length is 65535 bytes, but XenForo is splitting the string by characters.
 

Mike

XenForo developer
Staff member
#2
So part of this was a miscalculation on our part, though it is a bit of an consistency within MySQL. When you say VARCHAR(255) on a UTF-8 column, it actually means 255 characters. However, when you have a text column, the limit (65KB, 16MB, etc) is actually a byte limit and thus is affected by the variable length of UTF-8. Our count/length checks are done as characters across the board.

The safest thing here, with respect to about and signatures is to enforce a much lower limit. MySQL UTF-8 is normally only supports 3 byte characters, so that puts the worst case space limit as ~21000 characters but I've changed it to limit to 20000. Furthermore, I'm actually applying limits to the length of these fields to fit with the maximum message length as well.