UTF-8 URL encoding with PHP 8.2 is broken

Well, that worked with PHP 8.1 but not so well with 8.2. After some more research, my solution would be this:

/src/vendor/symfony/dom-crawler/Crawler.php line 196
$content = htmlspecialchars_decode(htmlentities($content, ENT_COMPAT, 'UTF-8'), ENT_QUOTES);

That older function mb_convert_encoding does not support HTML entities anymore in PHP 8.2 which is probably the reason behind this change. However in new version that iconv which converts string from UTF-8 to ISO-8859-1 seems to be causing problems. I actually don't know why is it there as things seem to be fine (actually better) without it? But maybe someone who actually knows something about these things can share some knowledge.
 
Final solution (until I find yet another problem and change my mind with this):

/src/vendor/symfony/dom-crawler/Crawler.php

line 196
$content = htmlspecialchars_decode(htmlentities($content, ENT_COMPAT, 'UTF-8'), ENT_QUOTES);

line 204
@$dom->loadHTML(mb_encode_numericentity($content, [0x80, 0x10FFFF, 0, ~0], 'UTF-8'));

First modification removes the (in my opinion) unnecessary conversion to ISO-8859-1 and the second one makes sure that loadHTML goes correctly with UTF-8 by converting $content from character code to HTML numeric character references.

Reference for this: https://stackoverflow.com/questions/8218230/php-domdocument-loadhtml-not-encoding-utf-8-correctly

In my testings unfurl now works correctly with Xenforo 2.2.15 and PHP 8.2.8 when these two lines are modified:

1708072464136.webp
 
Last edited:
Top Bottom