Fixed vBulletin 4.x Import: $config['fullUnicode'] is ignored

Steffen

Well-known member
Affected version
2.0.0
The importer does not use the $config['fullUnicode'] variable to possibly initialize the "XF\Import\DataManager" class with $fullUnicode = true. Therefore the "DataManager::convertToUtf8" method strips all 4-byte Unicode characters from private messages, postings etc.

I think the the fix is as follows:

Code:
diff --git a/xenforo/src/XF/Import/Manager.php b/htdocs/xenforo/src/XF/Import/Manager.php
--- a/xenforo/src/XF/Import/Manager.php
+++ b/xenforo/src/XF/Import/Manager.php
@@ -240,7 +240,7 @@ class Manager
         }
 
         $log = $this->getLog($session->logTable);
-        $dataManager = new DataManager($this->app, $log, $session->retainIds);
+        $dataManager = new DataManager($this->app, $log, $session->retainIds, $this->app->config('fullUnicode'));
 
         $importer->initialize($session, $dataManager, $session->baseConfig);

(I'm not sure whether vB4 officially supports utf8mb4, we might have patched that in the past. But I think the patch cannot do any harm.)
 
I've implemented this change, though I'm waiting for @Mike to review the change to ensure that it won't have any unexpected consequences.

FWIW, all the test imports I've performed with vBulletin versions below 5 have used ISO-8859-1 or windows-1252, which was the default collation for the installation. If anyone has a vBulletin database that uses something other than that, it would be useful to have a copy for internal testing purposes.

On a side note, thanks to @Steffen for your suggestions on the importer, it's great to get such useful feedback.
 
Thanks! :)

Our database has a size of 30 GiB, the vBulletin installation is customized (e.g. utf8mb4 support) and for privacy reasons (private messages, internal forums) it would only be possible to submit a subset anyways. We triggered the bug as follows: There is a private message that only consists of two emojis "😭😭". The importer stripped them because of $fullUnicode = false and then complained about an empty message :) So I guess you can duplicate an existing test database of yours, convert all tables and columns to utf8mb4 (IIRC this works everywhere except for vb_session because of index length limits, just use ascii_bin there) and then insert a post or private message with the text "😭😭". This should trigger the importer issue.
 
Top Bottom