XF 2.1 Received invalid UTF-8 for string column [message]

Earl

Well-known member
Importing from vb 3.8 to XF v2.1.1
Server Error log 1:

InvalidArgumentException: Received invalid UTF-8 for string column [message] . src/XF/Import/Data/EntityEmulator.php:155

Stack trace
Code:
#0 src/XF/Import/Data/AbstractEmulatedData.php(35): XF\Import\Data\EntityEmulator->set('message', '[QUOTE="ravinat...', Array)
#1 src/addons/XFI/Import/Importer/vBulletin.php(3197): XF\Import\Data\AbstractEmulatedData->set('message', '[QUOTE="ravinat...', Array)
#2 src/XF/Import/Runner.php(231): XFI\Import\Importer\vBulletin->stepPosts(Object(XF\Import\StepState), Array, 8)
#3 src/XF/Import/ParallelRunner.php(212): XF\Import\Runner->runStep('posts', Object(XF\Import\StepState), 8)
#4 src/XF/Cli/Command/ImportChildProcess.php(78): XF\Import\ParallelRunner->runChildProcess('posts', 1293000, 1296000, Object(Closure))
#5 src/vendor/symfony/console/Command/Command.php(255): XF\Cli\Command\ImportChildProcess->execute(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#6 src/vendor/symfony/console/Application.php(953): Symfony\Component\Console\Command\Command->run(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#7 src/vendor/symfony/console/Application.php(248): Symfony\Component\Console\Application->doRunCommand(Object(XF\Cli\Command\ImportChildProcess), Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#8 src/vendor/symfony/console/Application.php(148): Symfony\Component\Console\Application->doRun(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#9 src/XF/Cli/Runner.php(63): Symfony\Component\Console\Application->run(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#10 cmd.php(15): XF\Cli\Runner->run()
#11 {main}

----
Code:
Previous InvalidArgumentException: Received invalid UTF-8 for string column - src/XF/Mvc/Entity/ValueFormatter.php:126
#0 src/XF/Import/Data/EntityEmulator.php(151): XF\Mvc\Entity\ValueFormatter->castValueToType('[QUOTE="ravinat...', 5, Array)
#1 src/XF/Import/Data/AbstractEmulatedData.php(35): XF\Import\Data\EntityEmulator->set('message', '[QUOTE="ravinat...', Array)
#2 src/addons/XFI/Import/Importer/vBulletin.php(3197): XF\Import\Data\AbstractEmulatedData->set('message', '[QUOTE="ravinat...', Array)
#3 src/XF/Import/Runner.php(231): XFI\Import\Importer\vBulletin->stepPosts(Object(XF\Import\StepState), Array, 8)
#4 src/XF/Import/ParallelRunner.php(212): XF\Import\Runner->runStep('posts', Object(XF\Import\StepState), 8)
#5 src/XF/Cli/Command/ImportChildProcess.php(78): XF\Import\ParallelRunner->runChildProcess('posts', 1293000, 1296000, Object(Closure))
#6 src/vendor/symfony/console/Command/Command.php(255): XF\Cli\Command\ImportChildProcess->execute(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#7 src/vendor/symfony/console/Application.php(953): Symfony\Component\Console\Command\Command->run(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#8 src/vendor/symfony/console/Application.php(248): Symfony\Component\Console\Application->doRunCommand(Object(XF\Cli\Command\ImportChildProcess), Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#9 src/vendor/symfony/console/Application.php(148): Symfony\Component\Console\Application->doRun(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#10 src/XF/Cli/Runner.php(63): Symfony\Component\Console\Application->run(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#11 cmd.php(15): XF\Cli\Runner->run()
#12 {main}

Request state
Code:
array(1) {
  ["cli"] => string(89) "/usr/local/lsws/site/public_html/cmd.php xf:import-child-process posts 1293000 1296000"
}

Server Error log 2:
RuntimeException: Child process exited with code 1. See control panel error error logs for details. src/XF/Import/ParallelProcessManager.php:170
Stack trace
Code:
#0 src/XF/Import/ParallelRunner.php(55): XF\Import\ParallelProcessManager->execute(Object(XF\Import\Manager), Object(Closure))
#1 src/XF/Import/Runner.php(128): XF\Import\ParallelRunner->runUntilCompleteInternal(Object(XF\Import\Manager), Object(Closure))
#2 src/XF/Cli/Command/Import.php(144): XF\Import\Runner->runUntilComplete(Object(XF\Import\Manager), Object(Closure))
#3 src/vendor/symfony/console/Command/Command.php(255): XF\Cli\Command\Import->execute(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#4 src/vendor/symfony/console/Application.php(953): Symfony\Component\Console\Command\Command->run(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#5 src/vendor/symfony/console/Application.php(248): Symfony\Component\Console\Application->doRunCommand(Object(XF\Cli\Command\Import), Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#6 src/vendor/symfony/console/Application.php(148): Symfony\Component\Console\Application->doRun(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#7 src/XF/Cli/Runner.php(63): Symfony\Component\Console\Application->run(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#8 cmd.php(15): XF\Cli\Runner->run()
#9 {main}

Request state
Code:
array(1) {
  ["cli"] => string(31) "cmd.php xf:import --processes=7"
}

mI3puMTl.png
 
Last edited:
I know devs can't reproduce the bug because they don't have the database that I'm trying to import. What can I do for them? I have a dedicated server with xenforo test install and this database. Please help.
I'm trying to import this for almost 2 years.
 
Here is the patch for this bug:
vBulletin.php.patch
Code:
Index: src/addons/XFI/Import/Importer/vBulletin.php
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
--- src/addons/XFI/Import/Importer/vBulletin.php        (date 1552471156000)
+++ src/addons/XFI/Import/Importer/vBulletin.php        (date 1552471156000)
@@ -3177,6 +3177,10 @@
                        {
                                $state->extra['postDateStart'] = $post['dateline'];
 
+                $post['title'] = $this->convertToUtf8($post['title']);
+                $post['pagetext'] = $this->convertToUtf8($post['pagetext']);
+                $post['threadtitle'] = $this->convertToUtf8($post['threadtitle']);
+
                                $message = $this->getPostMessage($post['title'], $post['pagetext'], $post['threadtitle']);
 
                                /** @var \XF\Import\Data\Post $import */

create a file named vBulletin.php.patch in xenforo base folder and put these content.
and run this command patch --verbose -p0 < vBulletin.php.patch
 
Do you have any examples of the post content where this was falling over?

The patch you have provided shouldn't be necessary because we automatically convert $message to UTF-8 already. Essentially here you are converting the message content to UTF-8 twice and that may yield unexpected results.
 
Do you have any examples of the post content where this was falling over?

The patch you have provided shouldn't be necessary because we automatically convert $message to UTF-8 already. Essentially here you are converting the message content to UTF-8 twice and that may yield unexpected results.
I don't know how to export specific parts in the contents of posts table. All I can do is providing you the database access, which I'm fine with it.
And it didn't continue the process until I add these lines
Code:
$post['title'] = $this->convertToUtf8($post['title']);

$post['pagetext'] = $this->convertToUtf8($post['pagetext']);

$post['threadtitle'] = $this->convertToUtf8($post['threadtitle']);

I removed the patch and pruned the xenforo installation with --clear option again.
I'm gonna execute the importing process once again, for the sake of helping you to fix this bug.
PM me your ssh public key.
 
The sample content you provided is similar to:
Code:
&#xDC3;&#xDD4;&#xD9A;&#xDB8;&#xDB4;&#xDD2;&#xDA7;&#xDA9;&#xDCA;&#xDAD;&zwj;&#xDCA;&zwj;&#xDBB;&#xDD9;&#xDB8;&#xDCA;&#xD8B;&#xDBD;&#xDCF;&#xDB6;
These are HTML entities which we decode back to their normal value:
Code:
සුකමපිටඩ්ත‍්‍රෙම්උලාබ
This is valid UTF-8.

So I can only assume it wasn't actually that post which is causing the problem.

To be more certain, you may need to run the import again but output the post ID for each post being handled. So after:
PHP:
$message = $this->getPostMessage($post['title'], $post['pagetext'], $post['threadtitle']);

Add:
PHP:
\XF::dump($post['postid']);

The very last ID before it failed is the problematic post. If it does end up being the same post, then I'm really not sure why you would be getting invalid UTF-8 errors.
 
sAQHuIv.png

The sample content you provided is similar to:
Code:
&#xDC3;&#xDD4;&#xD9A;&#xDB8;&#xDB4;&#xDD2;&#xDA7;&#xDA9;&#xDCA;&#xDAD;&zwj;&#xDCA;&zwj;&#xDBB;&#xDD9;&#xDB8;&#xDCA;&#xD8B;&#xDBD;&#xDCF;&#xDB6;
These are HTML entities which we decode back to their normal value:
Code:
සුකමපිටඩ්ත‍්‍රෙම්උලාබ
This is valid UTF-8.

So I can only assume it wasn't actually that post which is causing the problem.

To be more certain, you may need to run the import again but output the post ID for each post being handled. So after:
PHP:
$message = $this->getPostMessage($post['title'], $post['pagetext'], $post['threadtitle']);

Add:
PHP:
\XF::dump($post['postid']);

The very last ID before it failed is the problematic post. If it does end up being the same post, then I'm really not sure why you would be getting invalid UTF-8 errors.
 
That's a significantly higher postid than you assumed was the problematic post previously.

If I'm looking at the correct database, the post it failed on contains this text:
cute couple :D

...followed by a series of IMG tags (with broken images).

Again, there's nothing there that wouldn't be valid UTF-8.

If now below the previous dump line you added to get the postid, you now add:
PHP:
\XF::dump($message);
And run it again, what's the output of the last entry? The dumps happen before the message is set on the data handler so the very last entry should be the one it fails on.
 
According to your instructions, I found the problematic post id is 2572226 (as in the previous screenshot)
Then again,
Doing this:
SQL:
SELECT * FROM `post` WHERE postid = 12572226
I found different contents. And i have sent you a PM, I guess you should find something interesting.
 
hcmy74T.png

That's a significantly higher postid than you assumed was the problematic post previously.

If I'm looking at the correct database, the post it failed on contains this text:

...followed by a series of IMG tags (with broken images).

Again, there's nothing there that wouldn't be valid UTF-8.

If now below the previous dump line you added to get the postid, you now add:
PHP:
\XF::dump($message);
And run it again, what's the output of the last entry? The dumps happen before the message is set on the data handler so the very last entry should be the one it fails on.
 
Right, so the screenshot was cutting off the 1...!

I can see a unicode character in there that is weird, but I'm not necessarily sure it is enough to cause problems.

At this point we'd likely need SSH access to the server so I've sent you my SSH key.
 
&#65534; and &#65535; are invalid characters (https://www.charbase.com/fffe-unicode-invalid-character)
So members can use unicode converters and post those invalid unicode characters in many areas and mess up the database (in visitor messages, user names etc)...
(ugfhhh I hate those guys... and unicodes)
But adding this line $this->convertToUtf8($post['pagetext']); fixes the issue? What's the solution for this?
Is it a bug?
 
I've been able to look into this and I've narrowed it down to a very likely outdated version of the PCRE regex library which is generating a false negative when checking for UTF-8 validity.

On your server if you run .../php test.php you will see this output:
Code:
bool(false)
array(0) {
}
string(15) "8.32 2012-11-30"
PHP Fatal error:  Uncaught InvalidArgumentException: Received invalid UTF-8 for string column in .../public_html/test.php:13
Stack trace:
#0 {main}
  thrown in .../public_html/test.php on line 13

bool(false) is indicating that our regular expression to test UTF-8 validity failed. The 8.32 2012-11-30 bit is your PCRE version. And the exception is a reproduction of the same exception you are getting in the importer.

However, I cannot reproduce the same result on any other server I tested.

Interestingly, there is a way I've been able to test a bunch of PHP versions with the same code, and the results are interesting:

You will notice that the majority of PHP versions are giving an expected output:
Code:
int(1)
array(1) {
  [0]=>
  string(3) "￾"
}
string(15) "8.38 2015-11-23"
(Notably it isn't followed by any exceptions, which means the UTF-8 is valid).

But further down you will see:

200263

The notable thing here is that when it fails, the PCRE version is the same as yours.

You will also note that it is quite an old PCRE version - modern versions of PHP should be using PCRE 10 or 11.

So, long story short; you need to make sure an up-to-date version of PCRE is available in PHP. You can check the PHP version in phpinfo too:

200264

I think as long as you can get the PCRE version to be slightly higher, that should get around the issue.
 
Well then I really appreciate the the amazing help @Chris D, then again, Hey @lsmichael, I installed pcre version 10.23-2.el7, and the package name is pcre2-devel now (centos).

yum install lsphp71 is not detecting the latest version of PCRE.


I'm using OpenLiteSpeed 1.4.46
I used to follow @Slavik 's guide, then again after php7.1 released, doing this: yum install lsphp71 became much easier.

How do I re-config and recompile lsphp71 to detect new pcre2-devel package?
 
That question sounds like it might be better on the Lightspeed forum.

I don't personally have a lot of experience with it, and we can really only provide support with the software itself so hopefully posting on their forums will yield some advice.
 
Top Bottom