Minor: \XF\Str\Formatter::wholeWordTrimAroundTerm length calculation bug with Turkish

PaulB

Well-known member
Affected version
2.2.11
The following line exists in wholeWordTrimAroundTerm:
PHP:
$termPosition = utf8_strpos(utf8_strtolower($string), utf8_strtolower($term));
However, changing the case of a string isn't guaranteed to preserve the length in many languages. A popular example is German, where ß has historically had no uppercase equivalent, so it's commonly converted to SS:
Code:
php > $s = 'ß';
php > echo mb_strlen($s);
1
php > echo mb_strlen(mb_strtoupper($s));
2
php > echo mb_strlen(mb_strtolower(mb_strtoupper($s)));
2
Although this particular example only causes issues with strtoupper, there is at least one common situation in which the string length will change on strtolower: an uppercase Turkish dotted-eye ("İ"). In order to preserve the difference between dotted and non-dotted I's, mb_strtolower will preserve the dot when lowercasing:
Code:
php > $s = 'İ';
php > echo mb_strlen($s), PHP_EOL, mb_strlen(mb_strtolower($s));
1
2
php > echo mb_strtolower($s);
i̇

This is a very minor bug that shouldn't have any serious consequences, but this code will probably be modified to support PHP 8.2 anyway (utf8_*mb_*).
 
Top Bottom