UTF-8 URL encoding with PHP 8.2 is broken

Recep Baltaş

Well-known member
Affected version
2.2.13
After trying to figure it out for weeks I finally found the solution to this URL problem. Downgrading PHP 8.2 (8.2.12) to 8.1 (8.1.26) fixed the issue.

At this moment, I don't know if it's a PHP issue or XenForo issue but I am betting on XenForo and thus creating this thread.
 
Hello,
I was able to solve the issue for PHP 8.2.13. The problem arises from character set declaration. After examining the XenForo source code for a while, I came across a function named "Unfurl" (I accessed this function with the code in message.)
  • Unfurl: A function that captures links within the text and generates HTML code to make it more understandable on the website using auxiliary functions.
  • After examining the function a bit, it directed me to another function called metadataFetcher. It leads to "getTitle()" and finally to the "cleanMetadataString()" function.

When I examined the code here, it appeared that it removes all harmful words, unnecessary spaces, etc. The converter is written globally for English, so I added only the utf-8 converter.
After inspecting the code, I added a code to check the character set before the $string definitions and updated it as follows.

File location: src/XF/Http/Metadata.php:262 (XenForo 2.2.13)

PHP:
public function cleanMetadataString($string, $isUrl = false)
    {
        if (!$string)
        {
            return '';
        }
        // Added Code Start
        if(mb_check_encoding($string, 'UTF-8') === false){
            $string = mb_convert_encoding($string, 'UTF-8', 'ISO-8859-1');
        }else{
            $string = mb_convert_encoding($string, 'ISO-8859-1', 'UTF-8');
        }
        // Added Code End
      
        $string = \XF::cleanString($string);
        $string = utf8_unhtml($string, true);
        $string = html_entity_decode($string, ENT_QUOTES | ENT_HTML5, 'UTF-8');
        $string = utf8_unhtml($string);
        $string = str_replace("\n", ' ', trim($string));
        $string = \XF::cleanString($string);
        if ($isUrl)
        {
            /** @var \XF\Validator\Url $validator */
            $validator = $this->app->validator('Url');
            $string = $validator->coerceValue($string);
            if (!$validator->isValid($string))
            {
                $string = '';
            }
        }
        return $string;
    }
Code source used to solve the issue: PHP: utf8_encode - Manual

Screenshots indicating the issue is resolved:

1703531650309.webp
1703531657021.webp
1703531664622.webp
 
This forum also has this error
and it's not php's fault
 
Because it looks like, THIS forum hasn't got this issue, same as many other forums and demo. This has been then categorized as PHP issue.

PHP 7.x is still fine for XenForo.
 
This has been then categorized as PHP issue.
We did not see this bug on our forum when running Xenforo 2.2.11 with PHP 8.0 or 8.1.

Now with 2.2.15 we see this bug and switching PHP between 8.0 - 8.1 - 8.2 makes absolutely no difference. 7.x versions we do not have available now and I really don't think they should be needed.
 
They even accepted it and provided a patch back then:

 
They even accepted it and provided a patch back then:

Is this the solution?
 
Is this the solution?
It was the solution for that problem, but this is not the same bug. That earlier bug threw nasty errors to user if URL itself contained special characters.

This time there are no errors in the url itself, but the unfurl preview special characters are broken.
 
Last edited:
I found one difference with Xenforo versions that affects this.

/src/vendor/symfony/dom-crawler/Crawler.php line 196

Xenforo 2.2.11:
$content = mb_convert_encoding($content, 'HTML-ENTITIES', $charset);

Xenforo 2.2.15:
$content = htmlspecialchars_decode(iconv('UTF-8', 'ISO-8859-1', htmlentities($content, ENT_COMPAT, 'UTF-8')), ENT_QUOTES);

Reverting this line of code to previous version seems to cure the problem but I must do some more researching to see if it breaks something else.
 
Last edited:
Back
Top Bottom