Not a bug Searching contents returns no result for all members except certain early-joined admins (when NGRAM is enabled).

Affected version
XF 2.2

ShikiSuen

Well-known member
BTW I disabled the NGRAM to make sure the table structure is vanilla (XF default).

I set the minimum search word length = 2.
However, for double-byte characters like chinese words, word length shorter than 4 returns no search results.
 

Mike

XenForo developer
Staff member
I was going to ask in the other thread if there were any database changes involved and this suggests that there are. Your last message seems to indicate there are results returned by default without the alternative parsing so that would point to the issue being related to the non-standard changes applied.

Unfortunately, this simply isn't something we support. If you are making changes to the database like this, there may be custom PHP changes required to change how the search strings are passed to MySQL. I'm not familiar with full text search using the ngram parser, so we can't really comment on that.

XFES is not "developed" for giant forums. It improves the search results in all forums and provides better performance for large forums. CJK searching is, roughly, an entirely distinct search approach to space-separated/word-based languages and thus alternative approaches may need to be taken. This isn't something that XF officially provides or supports as the default configurations (for both MySQL and Elasticsearch) don't use a tokenization approach that work in these situations.

It's worth mentioning that if you change the min word size in MySQL, you need to rebuild the table/indexes for it to be reflected.
 

ShikiSuen

Well-known member
I was going to ask in the other thread if there were any database changes involved and this suggests that there are. Your last message seems to indicate there are results returned by default without the alternative parsing so that would point to the issue being related to the non-standard changes applied.

Unfortunately, this simply isn't something we support. If you are making changes to the database like this, there may be custom PHP changes required to change how the search strings are passed to MySQL. I'm not familiar with full text search using the ngram parser, so we can't really comment on that.

XFES is not "developed" for giant forums. It improves the search results in all forums and provides better performance for large forums. CJK searching is, roughly, an entirely distinct search approach to space-separated/word-based languages and thus alternative approaches may need to be taken. This isn't something that XF officially provides or supports as the default configurations (for both MySQL and Elasticsearch) don't use a tokenization approach that work in these situations.

It's worth mentioning that if you change the min word size in MySQL, you need to rebuild the table/indexes for it to be reflected.
Thanks for your response.

I did rebuild the index after changing the word length.
It looks like by default that XenForo treats two CJK characters as one length unit, according to my current observation.

It's not my site (and the sitemaster is someone in this forum).
Unfortunately he doesn't want our technical relationships being compromised to the public due to his fear of his commercial competitors.
Our site is using a free distro variant of RHEL and the paid Chinese enterprise version of a webpanel, and that panel doesn't support the installation of ElasticSearch.
I will try all other possible resorts before admitting our necessity of buying XFES (considering its cost of server computing power).
 
Last edited:

Mike

XenForo developer
Staff member
It looks like by default that XenForo treats two CJK characters as one length unit, according to my current observation.
Just to be clear, this is entirely an internal MySQL behavior and thus not something we have direct control over. MySQL has an internal parsing algorithm for breaking text into "words" which become the main component of their search system, which it then runs through their constraints (like min and max word length). I would assume this would be based on the character set used by the table and thus one character should be considered to have a length of 1, but I can't really comment on the specific technicals.
 

ShikiSuen

Well-known member
Just to be clear, this is entirely an internal MySQL behavior and thus not something we have direct control over. MySQL has an internal parsing algorithm for breaking text into "words" which become the main component of their search system, which it then runs through their constraints (like min and max word length). I would assume this would be based on the character set used by the table and thus one character should be considered to have a length of 1, but I can't really comment on the specific technicals.
It occurred to me that my NGRAM solution requires changing the collation of all XenForo tables from utf8mb4_general_ci to utf8mb4_unicode_520_ci.

If that really is, then that might be why no one in my Engravers-China forum complaints about this. I enforced utf8mb4_unicode_520_ci to the entire forum SQL database file of my Engravers-China forum.
 

ShikiSuen

Well-known member
Update: Engravers-China forum has the same issue: only the early-joined admins can have search results.
utf8mb4_unicode_520_ci is not directly helpful in this case.

Looks like we have to use XFES at this moment. I'll ask the webmaster to see whether he has budgets.
 
Top