CJK searching has already been a big problem for Chinese, Japanese, Korean admins using VBB or IPB.I hope Xenforo can work this out.These are limitations of MySQL full text search and it's approach to tokenizing words. CJK (Chinese, Japanese, Korean) searching is a challenging thing for any Western language-based search system to support.
Hmm, I don't think it's 不能支持中文 but不能支持中文?据我所知vbb4 的中文搜索问题,只被一个人用了很短的时间就解决了,虽然不像处理英文搜索那么完美,但是完全满足使用了。
These are limitations of MySQL full text search and it's approach to tokenizing words. CJK (Chinese, Japanese, Korean) searching is a challenging thing for any Western language-based search system to support.
Hmm, a good point. My main question is if Discuz can get it to work, would it still be possible (and manageable) for the XF team to allow support for CJK?It's a failed cause trying to enter those market with commercial discussion forum packages because of well rooted market for Discuz...
Discuz! 论坛(BBS),是一个采用PHP 和MySQL...
such a headache indeed! I for one am glad I don't have to deal with this - but indeed to have such support would be another postive tick for xenforoIt's 3am on a school night, so I shouldn't be replying, but @#$% fml. There are so many points to be made here, I don't even know where to start....
Database configuration is also a big issue. I don't remember the variable name, but long story short, we have several stages of things going wrong:
- HTML's character encoding
- MySQL's connection encoding
- MySQL database's charset setting
- MySQL database's collation
Several combinations of there of can work together, and present what end user would call "Chinese". But they all mean different things, and would require some different handling. I still recall changing one variable in config.php of vBulletin can cause your database to spew out garbage and blank pages... And changing the said variable at a wrong time, or attempting to make a backup inappropriately can result in full irreversible data loss.
Oh, also, for the record, having CJK search in vB China's modified distro did not particularly helped penetrate Chinese market. Interestingly, opposite to the popular voice, people frankly don't care. They have their Discuz, they're happy and not planning to change. They have their pirate version running with our modified code and have no intention in purchasing the license.
Anyways, 4am now... I've spent about 1 hour writing and deleting... I don't even know if this makes any sense. I'm just gonna hit post reply, get burned for any mistakes, and check over again with a clear mind. so much for waking up at 6 to go in early and work out thesis stuff with my prof...
It's 3am on a school night, so I shouldn't be replying, but @#$% fml. There are so many points to be made here, I don't even know where to start....
First; vBulletin 4's search is not done by one person, but a team. However small as it may be, myself was also involved to some minuscule extent. Additionally, it was not done in a short time, it was done over weeks and weeks of development time. Certain development team members were stuck on that project for several weeks straight.
Also, the said search system have its problems. Contents are not searchable until they are indexed by a scheduled task; indexer uses fair bit of resources; results have counting limitations are just a few problems I can remember. I don't have the luxury of going into details, but I think most can take ease in just taking my words for it.
Next; Implementations of CJK search. In vBulletin China's modified distribution, it goes back to at least 3.5; although not "official", I'm pretty sure 3.0.x (I know I've translated the hack and posted on vb.org) and 2.x series was also covered to some extent. This was achieved by taking every single word in posts and bundle them together in 2 CJK character indexes. This method, while it may work, is limited by MySQL's index size limitation. As result of this, busy forums and long posts often encountered errors or simply don't work proper. My 2 minutes scanning of Discuz's code also suggests a similar implementation, but I'll need to read further when I have time to find out for sure.
Added to the CJK search problem, this only works assuming if we know the character encoding used on the forum (Read: UTF-8). Neither vB (Jelsoft version anyways) nor IPB have really strictly forced everyone to use Unicode UTF-8. There are many valid reasons for this, I had several long posts on vb.com's forum about these issues. But what this does mean is by the time you import to XenForo, you'll probably have to discard some content because converter cannot convert multiple source encodings (IE: Chinese forum with BIG5 and GBK for Traditional and Simplified Chinese sub forums). If we were to make some sort of indexer that will index CJK text for search, it will only really work best for new freshly installed XenForo with no contents; or you're lucky and already have your source all sorted out in UTF-8.
Database configuration is also a big issue. I don't remember the variable name, but long story short, we have several stages of things going wrong:
- HTML's character encoding
- MySQL's connection encoding
- MySQL database's charset setting
- MySQL database's collation
Several combinations of there of can work together, and present what end user would call "Chinese". But they all mean different things, and would require some different handling. I still recall changing one variable in config.php of vBulletin can cause your database to spew out garbage and blank pages... And changing the said variable at a wrong time, or attempting to make a backup inappropriately can result in full irreversible data loss.
Oh, also, for the record, having CJK search in vB China's modified distro did not particularly helped penetrate Chinese market. Interestingly, opposite to the popular voice, people frankly don't care. They have their Discuz, they're happy and not planning to change. They have their pirate version running with our modified code and have no intention in purchasing the license.
Anyways, 4am now... I've spent about 1 hour writing and deleting... I don't even know if this makes any sense. I'm just gonna hit post reply, get burned for any mistakes, and check over again with a clear mind. so much for waking up at 6 to go in early and work out thesis stuff with my prof...
We use essential cookies to make this site work, and optional cookies to enhance your experience.