jeffwidman
Active member
I imported a SMF forum awhile back using the official importer and ran into some issues with my old data not being properly converted to UTF-8 during the conversion.
This was basically my problem: http://stackoverflow.com/a/2477480/770425
This can't be generically fixed across all SMF installations because it really depends what encoding they were using originally, plus as of SMF 2 tables are allowed to be encoded as UTF-8.
However, the majority of SMF forums are probably using latin 1, and not hard to convert those by detecting the encoding of the old tables.
At first I was using a simple find-and-replace:
– --> –
— --> —
’ --> ’
etc
But then I found this generic solution: http://jonisalonen.com/2012/fixing-doubly-utf-8-encoded-text-in-mysql/
So maybe the SMF importer (and probably other importers too) should check if any tables are Latin 1 encoded, and if so run this conversion on the appropriate table columns when they are imported into Xenforo's UTF-8 tables.
Note to anyone else applying this on old data--be careful to *only* apply this on content created under SMF. If this were built into the importer shouldn't need to restrict to this.
I applied this on xf_post.message, xf_conversation_message.mssage, xf_user_profile.signature, xf_user_profile.about, xf_conversation_master.title, xf_node.title, xf_node.description, xf_poll.question, xf_poll_response.response.response, xf_thread.title
Any other columns that likely need to be updated?
Thankfully none of my usernames seemed to be affected. Probably SMF didn't allow special characters there.
For reference, here was another very useful article: https://www.bluebox.net/insight/blog-article/getting-out-of-mysql-character-set-hell
This was basically my problem: http://stackoverflow.com/a/2477480/770425
This can't be generically fixed across all SMF installations because it really depends what encoding they were using originally, plus as of SMF 2 tables are allowed to be encoded as UTF-8.
However, the majority of SMF forums are probably using latin 1, and not hard to convert those by detecting the encoding of the old tables.
At first I was using a simple find-and-replace:
– --> –
— --> —
’ --> ’
etc
But then I found this generic solution: http://jonisalonen.com/2012/fixing-doubly-utf-8-encoded-text-in-mysql/
Code:
CONVERT(BINARY CONVERT(message USING latin1) USING utf8)
So maybe the SMF importer (and probably other importers too) should check if any tables are Latin 1 encoded, and if so run this conversion on the appropriate table columns when they are imported into Xenforo's UTF-8 tables.
Note to anyone else applying this on old data--be careful to *only* apply this on content created under SMF. If this were built into the importer shouldn't need to restrict to this.
I applied this on xf_post.message, xf_conversation_message.mssage, xf_user_profile.signature, xf_user_profile.about, xf_conversation_master.title, xf_node.title, xf_node.description, xf_poll.question, xf_poll_response.response.response, xf_thread.title
Any other columns that likely need to be updated?
Thankfully none of my usernames seemed to be affected. Probably SMF didn't allow special characters there.
For reference, here was another very useful article: https://www.bluebox.net/insight/blog-article/getting-out-of-mysql-character-set-hell
Last edited: