Help URLs with invalid UTF-8 cause exceptions in MySQL 8

PaulB

Well-known member
Affected version
2.2.8.0
The following URL will result in a 404 with MariaDB but an exception with MySQL 8.0.x: https://xenforo.com/community/help/%c0a Furthermore, the exception may fail to log to the database and third-party monitoring services (in our case, Datadog).

This tends to be triggered often by vulnerability scanners such as Acunetix. Since it results in a lot of 500 errors, it can result in people being woken up by automated systems at ungodly hours.

Note that %c0a is invalid UTF-8. Replace <invalid UTF-8> in the following error message with the raw bytes from the URL.

Code:
[error] 139#139: *136378 FastCGI sent in stderr: "PHP message: XenForo unexpected DB error
MySQL statement prepare error [1267]: Illegal mix of collations (utf8_general_ci,IMPLICIT) and (utf8mb4_0900_ai_ci,COERCIBLE) for operation '=' in src/XF/Db/AbstractStatement.php on line 230

            SELECT `xf_help_page`.*
            FROM `xf_help_page`
            LEFT JOIN `xf_addon` AS `xf_addon_AddOn_1` ON (`xf_addon_AddOn_1`.`addon_id` = `xf_help_page`.`addon_id`)
            WHERE (`xf_help_page`.`page_name` = '<invalid UTF-8>') AND (`xf_help_page`.`active` = 1) AND ((`xf_addon_AddOn_1`.`active` = 1) OR (`xf_help_page`.`addon_id` = ''))
            
        
LIMIT 1
------------

#0 src/XF/Db/Mysqli/Statement.php(198): XF\Db\AbstractStatement->getException('MySQL statement...', 1267, 'HY000')
#1 src/XF/Db/Mysqli/Statement.php(41): XF\Db\Mysqli\Statement->getException('MySQL statement...', 1267, 'HY000')
#2 src/XF/Db/Mysqli/Statement.php(56): XF\Db\Mysqli\Statement->prepare()
 
If you're behind Cloudflare, you can try removing that invalid character via CF Transform URL rewrite rules
You're going to have a hard time doing that. The example I gave is just something I made up; Acunetix likes to generate random bytes that aren't UTF-8. I chose /%c0a when providing an example because it's very obviously invalid: in UTF-8, there can never be a byte with its most significant bit set surrounded on either side by bytes with their most significant bit unset. It's not that the characters are invalid; it just can't be decoded. You can't get codepoints from it, so there aren't any characters in the first place.

/%c0a results in three bytes; in hexadecimal, they're 2f c0 21. Notice that only the middle byte is >= 0x80. There's no way to decode that as UTF-8.
 
You're going to have a hard time doing that. The example I gave is just something I made up; Acunetix likes to generate random byte

I see what you mean! Arggh! You can use Transform rules to match Acunetix user agent and/or ASN and rewrite URLs there somehow or just block them in the first place.
 
You could, and we do, but we get hit with a lot of scanners from all over, and some people are smart enough to change the user agent. We also allow bug bounty hunters to search for vulnerabilities on our site as long as they're not disruptive, so we don't want to sweep bugs under the carpet--we'd rather fix them or report them upstream.
 
You could, and we do, but we get hit with a lot of scanners from all over, and some people are smart enough to change the user agent. We also allow bug bounty hunters to search for vulnerabilities on our site as long as they're not disruptive, so we don't want to sweep bugs under the carpet--we'd rather fix them or report them upstream.
If you're using Cloudflare, you could use CF Transform Request header modifications https://developers.cloudflare.com/rules/transform/request-header-modification/examples to tag Acunetix user agent/ASN scan requests only with a custom injected HTTP request header and then change up your alert system to check for 500 error code + that custom HTTP request header to determine if an alert is to be sent or not?
 
XF\Mvc\Router::routeToController should reject routes with invalid utf8 IMO before it even tries todo matching.

Maybe even earlier before it is passed to the router?
 
Top Bottom