1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

XF 1.1 Character encoding issues after vBulletin 3.8 import

Discussion in 'Installation, Upgrade, and Import Support' started by Sidane, May 13, 2013.

  1. Sidane

    Sidane Active Member

    Apologies if there is an obvious answer to this elsewhere, didn't find one after a brief search.

    I'm in the process of prepping my site to migrate to XenForo from vBulletin 3.8. On my local test server (OS X 10.8.3, Apache 2.2, MySQL 5.1) I've done a full import of the data but am having character encoding issues all over the place.


    Text in vBulletin

    After importing to XenForo

    My vBulletin installation is a standard one. Some character set queries on the VB database:

    show variables like "character_set_database";
    show variables like "collation_database";
    SELECT charset FROM language; 
    On the new XenForo database:

    show variables like "character_set_database";
    show variables like "collation_database";
    When I ran the XenForo importer, I didn't specify a value in the Force Character Set field as there is no encoding specified in my VB config.php.

    Any help on why this happening and a possible solution? I will be doing a full reimport again but it takes about 16 hours and I want to make sure that there will be no encoding issues.

    Thanks in advance! :)
  2. Jake Bunce

    Jake Bunce XenForo Moderator Staff Member

    Try reinstalling XF and specifying utf8 for the charset during the import.
  3. Sidane

    Sidane Active Member

    Thanks, but does that value not represent the charset for the existing vBulletin database, i.e. latin1?
  4. Jake Bunce

    Jake Bunce XenForo Moderator Staff Member

    It looks like the data is already utf8. The collations in your database may be incorrect.

    If the import works with utf8 then you know that was the problem.
  5. Sidane

    Sidane Active Member

    I've setup a fresh instance of Xenforo and imported all users with the 'Force Character Set' set to 'utf8'.

    The following user on the live vBulletin site has a À in his username, see http://www.redcafe.net/members/privateserve%C0%3F/

    After this fresh import, the À is appearing as Ã:


    So no joy there :( Any other ideas what could be wrong?
  6. Sidane

    Sidane Active Member

  7. Jake Bunce

    Jake Bunce XenForo Moderator Staff Member

    I was about to ask for a copy of your db but then I saw 12 million posts. :eek:

    The conversion function does rely on certain PHP extensions:


    	 * Convert the given text to valid UTF-8
    	 * @param string $string
    	 * @param boolean $entities Convert &lt; (and other) entities back to < characters
    	 * @return string
    	protected function _convertToUtf8($string, $entities = null)
    		// note: assumes charset is ascii compatible
    		if (preg_match('/[\x80-\xff]/', $string))
    			if (function_exists('iconv'))
    				$string = @iconv($this->_charset, 'utf-8//IGNORE', $string);
    			else if (function_exists('mb_convert_encoding'))
    				$string = mb_convert_encoding($string, 'utf-8', $this->_charset);
    		$string = utf8_unhtml($string, $entities);
    		$string = preg_replace('/[\xF0-\xF7].../', '', $string);
    		$string = preg_replace('/[\xF8-\xFB]..../', '', $string);
    		return $string;
    Those two functions come from these extensions:


    If both are missing then it would fail to convert. This is something you can check in your PHP configuration, or debug those functions to make sure they are working on your server.

    Otherwise I can take a look if you give me access to your server.
  8. AlexT

    AlexT Well-Known Member

    FWIW, the code Jake cited is a good place for throwing an exception if both functions don't exist.
  9. Mike

    Mike XenForo Developer Staff Member

    I would actually try forcing the connection character set to latin1. It's actually being "double converted". The data being given to XF is already in UTF-8, but because the settings in the DB think it's coming from latin1, it's converting that to UTF-8. Whenever you see "simple" accented characters going to 2 bytes, it's almost always this.

    If you're doing everything on the same server as vB, you shouldn't have to force the character set unless you are in vB's config.php, but if you're doing it on a different server, then your MySQL config may be different so bets are off and you may need to add (or remove) something there.
    Jake Bunce likes this.
  10. Sidane

    Sidane Active Member

    Both functions exist:

    php -r "if (function_exists('iconv')) { echo 'yes'; } else { echo 'no'; }"

    php -r "if (function_exists('mb_convert_encoding')) { echo 'yes'; } else { echo 'no'; }"
    Thanks, will give that a try.
  11. Sidane

    Sidane Active Member

    Success! Setting Force Character Set to latin1 did the trick.

    Thanks Mike!
    Mike and Jake Bunce like this.
  12. JoseFebus

    JoseFebus Member

    Hi Sidane,

    I hope you are doing great!

    How you were able to force the Char Set?

    Best Regards
  13. Jake Bunce

    Jake Bunce XenForo Moderator Staff Member

  14. JoseFebus

    JoseFebus Member

    I selected UTF-8 as define in the VB config...

    I am using the same server...

    What else do I need to change to get my international characters displayed properly?

  15. JoseFebus

    JoseFebus Member

    Can I reimport or do I have to install everything again?
  16. Jeremy

    Jeremy Well-Known Member

    Re-importing without re-installing will cause duplicated content.
  17. JoseFebus

    JoseFebus Member

    I noticed the collation of the vb tables is latin1_swedish_ci, should I use "latin1_swedish_ci" then importing?
  18. Jeremy

    Jeremy Well-Known Member

    If that's an option, yes.
  19. JoseFebus

    JoseFebus Member

    Worked perfectly!

    Thanks a lot!
    Jeremy likes this.

Share This Page