• This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn more.

Fixed  Import fails on posts with special characters from MS Word.

When doing an import last night with Beta 2, it failed about 3/4ths of the way through. After checking the server error log, I was able to track down the post that was causing this. It appears that the user copied and pasted a research report originally done in MS word, and the importer is crashing when encountering the code for a square looking bullet point character. When I deleted this character, the import continued without a problem. Here's an excerpt from the source of the post:

􀂙 Company reported a $0.01 for 1Q07, one cent lower than our
estimates and consensus of $0.02. We are maintaining a Buy Rating,
and lowering our FY07 estimates from $0.25 to $0.21 and FY08
estimates from $0.30 to $0.27. We are maintaining a target price of $3 or
11x 2008 projections.


XenForo developer
Staff member
I can guess, but do you know the specific error message? It should be logged in the server error log part of the admin CP. It should be stripping out 4 byte UTF-8 characters, though maybe that is bugged.
Zend_Db_Statement_Mysqli_Exception: Mysqli statement execute error : Incorrect string value: '\xF4\x80\x82\x99 C...' for column 'message' at row 1 - library/Zend/Db/Statement/Mysqli.php:214


XenForo developer
Staff member
Looks like the fix for stripping out 4-byte UTF-8 characters was incorrect (my fault!). I appears to actually be stripping 5 byte UTF-8 characters (with a slight mistake), which aren't actually allowed by the RFC anyway. (As a note, MySQL only supports 3 bytes UTF-8 chars, which represents the BMP.)

To anyone having this issue, try this as a fix. In library/XenForo/Importer/vBulletin.php, change the following line:
return preg_replace('/[\xF8-\xFB].../', '', $string);
return preg_replace('/[\xF0-\xF4].../', '', $string);
This code is effectively the last line in the file. I believe that should prevent/fix this error then.