1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Fixed SMF Importer doesn't swap encoding of special characters to UTF 8

Discussion in 'Resolved Bug Reports' started by jeffwidman, Jan 23, 2015.

  1. jeffwidman

    jeffwidman Active Member

    I imported a SMF forum awhile back using the official importer and ran into some issues with my old data not being properly converted to UTF-8 during the conversion.

    This was basically my problem: http://stackoverflow.com/a/2477480/770425

    This can't be generically fixed across all SMF installations because it really depends what encoding they were using originally, plus as of SMF 2 tables are allowed to be encoded as UTF-8.

    However, the majority of SMF forums are probably using latin 1, and not hard to convert those by detecting the encoding of the old tables.

    At first I was using a simple find-and-replace:
    – --> –
    — --> —
    ’ --> ’
    etc

    But then I found this generic solution: http://jonisalonen.com/2012/fixing-doubly-utf-8-encoded-text-in-mysql/

    Code:
    CONVERT(BINARY CONVERT(message USING latin1) USING utf8)
    So maybe the SMF importer (and probably other importers too) should check if any tables are Latin 1 encoded, and if so run this conversion on the appropriate table columns when they are imported into Xenforo's UTF-8 tables.

    Note to anyone else applying this on old data--be careful to *only* apply this on content created under SMF. If this were built into the importer shouldn't need to restrict to this.

    I applied this on xf_post.message, xf_conversation_message.mssage, xf_user_profile.signature, xf_user_profile.about, xf_conversation_master.title, xf_node.title, xf_node.description, xf_poll.question, xf_poll_response.response.response, xf_thread.title

    Any other columns that likely need to be updated?

    Thankfully none of my usernames seemed to be affected. Probably SMF didn't allow special characters there.

    For reference, here was another very useful article: https://www.bluebox.net/insight/blog-article/getting-out-of-mysql-character-set-hell
     
    Last edited: Jan 23, 2015
  2. Chris D

    Chris D XenForo Developer Staff Member

    It's now possible to specify the DB character set during the config of the importer. If it isn't already UTF-8, specify the correct character set and that should yield a better conversion.
     
    jeffwidman likes this.

Share This Page