Fixed SMF Importer doesn't swap encoding of special characters to UTF 8

jeffwidman

Active member
I imported a SMF forum awhile back using the official importer and ran into some issues with my old data not being properly converted to UTF-8 during the conversion.

This was basically my problem: http://stackoverflow.com/a/2477480/770425

This can't be generically fixed across all SMF installations because it really depends what encoding they were using originally, plus as of SMF 2 tables are allowed to be encoded as UTF-8.

However, the majority of SMF forums are probably using latin 1, and not hard to convert those by detecting the encoding of the old tables.

At first I was using a simple find-and-replace:
– --> –
— --> —
’ --> ’
etc

But then I found this generic solution: http://jonisalonen.com/2012/fixing-doubly-utf-8-encoded-text-in-mysql/

Code:
CONVERT(BINARY CONVERT(message USING latin1) USING utf8)

So maybe the SMF importer (and probably other importers too) should check if any tables are Latin 1 encoded, and if so run this conversion on the appropriate table columns when they are imported into Xenforo's UTF-8 tables.

Note to anyone else applying this on old data--be careful to *only* apply this on content created under SMF. If this were built into the importer shouldn't need to restrict to this.

I applied this on xf_post.message, xf_conversation_message.mssage, xf_user_profile.signature, xf_user_profile.about, xf_conversation_master.title, xf_node.title, xf_node.description, xf_poll.question, xf_poll_response.response.response, xf_thread.title

Any other columns that likely need to be updated?

Thankfully none of my usernames seemed to be affected. Probably SMF didn't allow special characters there.

For reference, here was another very useful article: https://www.bluebox.net/insight/blog-article/getting-out-of-mysql-character-set-hell
 
Last edited:
It's now possible to specify the DB character set during the config of the importer. If it isn't already UTF-8, specify the correct character set and that should yield a better conversion.
 
Back
Top Bottom