1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Service Issue Can not search Chinese

Discussion in 'Resolved Bug Reports' started by Lin, Aug 13, 2010.

Thread Status:
Not open for further replies.
  1. Lin

    Lin Member

    Although the use of UTF-8 encoding, but still can not search Chinese.
  2. Mike

    Mike XenForo Developer Staff Member

    These are limitations of MySQL full text search and it's approach to tokenizing words. CJK (Chinese, Japanese, Korean) searching is a challenging thing for any Western language-based search system to support.
  3. Lin

    Lin Member

    I hope you can solve this problem, let's use it just translate the language file. :)
  4. bookmark

    bookmark Well-Known Member

    CJK searching has already been a big problem for Chinese, Japanese, Korean admins using VBB or IPB.I hope Xenforo can work this out.
    As far as I know,China would be a big potential market for forum platform.Most of them would like to pay for the license if the forum support the CJK well.
  5. Andy Huang

    Andy Huang Well-Known Member

    Not really, no... Sadly, only small handful would be willing to pay, the rest majority would rather stay with Discuz or pirate... Though, after official version of XenForo gets released, if it still doesn't work then, I'll see what hacks I can try to device in my spare time...
    itsblack, Luke F and bookmark like this.
  6. Lin

    Lin Member

    All over the world who use piracy, but the official technical support is that they can not do, this is the reason I am willing to pay.

    If you can not properly use the search function, then XenForo lose a customer.
  7. wangyu1314

    wangyu1314 Member

    Yes, I need the Chinese search support too.
    gordy likes this.
  8. Andy Huang

    Andy Huang Well-Known Member

    The whole 'lose a customer' or 'miss out on a market' idea needs to go, really. The XenForo team and I have seen it first hand with vB China. It's a failed cause trying to enter those market with commercial discussion forum packages because of well rooted market for Discuz, and general practice in piracy.

    Yes, there are a few potential customers lost, and yes, that is really sad that those are willing to pay probably will not get everything they want. But regardless of whether or not search actually works, the tiny fraction of potential customers in that market simply does not justify the time, effort, and cost involved to have a point of presence in that market.

    As much as I hate to put it so bluntly, XenForo's team will most likely benefit more by not worrying about CJK searches, but instead, focus their effort on addressing issues brought up by English users, where they know their majority of customers are.

    But, that said, XF team may choose to tread the dangerous water again should they choose to do so. That's entirely up for them to decide.
  9. sniper756

    sniper756 Member

    不能支持中文?据我所知vbb4 的中文搜索问题,只被一个人用了很短的时间就解决了,虽然不像处理英文搜索那么完美,但是完全满足使用了。
  10. chousho

    chousho Well-Known Member

    Hmm, I don't think it's 不能支持中文 but
    So it just means it would take more time to figure out, as CJK is not native to the programmers and requires a lot of time to figure out (meanwhile, the software is still in alpha).

    Hmm, a good point. My main question is if Discuz can get it to work, would it still be possible (and manageable) for the XF team to allow support for CJK?
    I would think so, as I see:
    I remember what a hassle it was trying to get support for searching in VB, trying to use iconv on large database, etc. It would really, REALLY be cool if Mike and Kier did somehow manage to nail this as it seems it can be done [from the example of Discuz]. I'm just not sure if they'll see it as worth the effort :p
  11. sniper756

    sniper756 Member

    尽快支持中文搜索吧,这是优于vbb 和 ipb的地方。绝对是个亮点!!!
  12. Andy Huang

    Andy Huang Well-Known Member

    It's 3am on a school night, so I shouldn't be replying, but @#$% fml. There are so many points to be made here, I don't even know where to start....

    First; vBulletin 4's search is not done by one person, but a team. However small as it may be, myself was also involved to some minuscule extent. Additionally, it was not done in a short time, it was done over weeks and weeks of development time. Certain development team members were stuck on that project for several weeks straight.

    Also, the said search system have its problems. Contents are not searchable until they are indexed by a scheduled task; indexer uses fair bit of resources; results have counting limitations are just a few problems I can remember. I don't have the luxury of going into details, but I think most can take ease in just taking my words for it.

    Next; Implementations of CJK search. In vBulletin China's modified distribution, it goes back to at least 3.5; although not "official", I'm pretty sure 3.0.x (I know I've translated the hack and posted on vb.org) and 2.x series was also covered to some extent. This was achieved by taking every single word in posts and bundle them together in 2 CJK character indexes. This method, while it may work, is limited by MySQL's index size limitation. As result of this, busy forums and long posts often encountered errors or simply don't work proper. My 2 minutes scanning of Discuz's code also suggests a similar implementation, but I'll need to read further when I have time to find out for sure.

    Added to the CJK search problem, this only works assuming if we know the character encoding used on the forum (Read: UTF-8). Neither vB (Jelsoft version anyways) nor IPB have really strictly forced everyone to use Unicode UTF-8. There are many valid reasons for this, I had several long posts on vb.com's forum about these issues. But what this does mean is by the time you import to XenForo, you'll probably have to discard some content because converter cannot convert multiple source encodings (IE: Chinese forum with BIG5 and GBK for Traditional and Simplified Chinese sub forums). If we were to make some sort of indexer that will index CJK text for search, it will only really work best for new freshly installed XenForo with no contents; or you're lucky and already have your source all sorted out in UTF-8.

    Database configuration is also a big issue. I don't remember the variable name, but long story short, we have several stages of things going wrong:
    - HTML's character encoding
    - MySQL's connection encoding
    - MySQL database's charset setting
    - MySQL database's collation

    Several combinations of there of can work together, and present what end user would call "Chinese". But they all mean different things, and would require some different handling. I still recall changing one variable in config.php of vBulletin can cause your database to spew out garbage and blank pages... And changing the said variable at a wrong time, or attempting to make a backup inappropriately can result in full irreversible data loss.

    Oh, also, for the record, having CJK search in vB China's modified distro did not particularly helped penetrate Chinese market. Interestingly, opposite to the popular voice, people frankly don't care. They have their Discuz, they're happy and not planning to change. They have their pirate version running with our modified code and have no intention in purchasing the license.

    Anyways, 4am now... I've spent about 1 hour writing and deleting... I don't even know if this makes any sense. I'm just gonna hit post reply, get burned for any mistakes, and check over again with a clear mind. so much for waking up at 6 to go in early and work out thesis stuff with my prof...
  13. p4guru

    p4guru Well-Known Member

    such a headache indeed! I for one am glad I don't have to deal with this - but indeed to have such support would be another postive tick for xenforo :)
  14. chousho

    chousho Well-Known Member

    Andy, thanks so much for providing all of the input (at the cost of your sleep, grades, and possible future plans to ever own your own house/car/clothing)~
    It seems that simply implementing search isn't just a drag and drop procedure, if only it were that easy, but a pain staking task that also can be resource intensive, and prone to bugs. While it would be cool if the work had been put in, with the limitations and the amount of hoops to jump through, I can see where they would have more critical issues to deal with--even just making sure XF is up to their quality for shipping out the door.

    Hopefully, perhaps when XF is released, a community of those of us interested in CJK XF could try to work around the limitations of MySQL and even release something upstream. But that's a big hope, haha.

    Thanks again, Andy :D
  15. SneakyDave

    SneakyDave Well-Known Member

    Andy, thanks for the input. What kind of steps are there to take, if any, of the seemingly rampant piracy of licensed software in Asia?
  16. sniper756

    sniper756 Member

  17. yoching

    yoching Member

  18. yoching

    yoching Member

    原來 Andy Huang 你也跑來這裏了啊。
  19. Denis

    Denis Member

    If can provide API or something else, then Xenforo fans(CJK) maybe can doing something related, let's see what will be happened then...
    chousho and sniper756 like this.
  20. sniper756

    sniper756 Member

    XenForo should support CJK searching.
Thread Status:
Not open for further replies.

Share This Page