Minor: \XF\Str\Formatter::wholeWordTrimAroundTerm length calculation bug with Turkish

PaulB · Nov 4, 2022

The following line exists in wholeWordTrimAroundTerm:

PHP:

$termPosition = utf8_strpos(utf8_strtolower($string), utf8_strtolower($term));

However, changing the case of a string isn't guaranteed to preserve the length in many languages. A popular example is German, where ß has historically had no uppercase equivalent, so it's commonly converted to SS:

Code:

php > $s = 'ß';
php > echo mb_strlen($s);
1
php > echo mb_strlen(mb_strtoupper($s));
2
php > echo mb_strlen(mb_strtolower(mb_strtoupper($s)));
2

Although this particular example only causes issues with strtoupper, there is at least one common situation in which the string length will change on strtolower: an uppercase Turkish dotted-eye ("İ"). In order to preserve the difference between dotted and non-dotted I's, mb_strtolower will preserve the dot when lowercasing:

Code:

php > $s = 'İ';
php > echo mb_strlen($s), PHP_EOL, mb_strlen(mb_strtolower($s));
1
2
php > echo mb_strtolower($s);
i̇

This is a very minor bug that shouldn't have any serious consequences, but this code will probably be modified to support PHP 8.2 anyway (utf8_* → mb_*).

Minor: \XF\Str\Formatter::wholeWordTrimAroundTerm length calculation bug with Turkish

PaulB

Well-known member

We value your privacy