• This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn more.

Fixed Cyrillic keywords highlighting

#1
Elastic search works correctly in the forums, where Xenforo enhanced search plug-in is installed, only with search queries, typed in Latin. If a query is typed in Cyrillic, the search is still true, but the search results of keywords containing uppercase letters, are not highlighted. That is, for the Cyrillic keywords highlighting is case sensitive. Apparently, it is a bug of plug-in above mentioned which needs to be fixed.

2016-03-26_14-30-14.jpg
 
Last edited:
#2
+1
On my forum with cyrillic content there is same problem. But this is not ElasticSearch bug.
This is bug of highlightSearchTerm function in library/XenForo/Helper/String.php - regular expression works only with latin symbols.

Unofficial fix:

In this function code:
PHP:
return preg_replace('/(' . preg_replace('#\s+#', '|', preg_quote(htmlspecialchars($term), '/')) . ')/si', '<em class="' . $emClass . '">\1</em>', htmlspecialchars($string));
must be:
PHP:
return preg_replace('/(' . preg_replace('#\s+#', '|', preg_quote(htmlspecialchars($term), '/')) . ')/siu', '<em class="' . $emClass . '">\1</em>', htmlspecialchars($string));
 
Last edited:
#4
Аnother bug has been found in the XenForo Enhanced Search plugin concerning improper handling of morphological word forms in the Cyrillic. If you install the original elastic search in any site, all word forms found in the search results are highlighted correctly. Here is the link for the implementation of this function in Elastic search. https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-highlighting.html)

However, the XenForo Enhanced Search plugin highlights not all word forms found, but only those that are strictly match search query.

2016-03-27_10-20-33.png 2016-03-27_10-20-33.png

That is, the highlighting of morphological forms for Cyrillic in the plugin does not work.
 

Mike

XenForo developer
Staff member
#5
This has been fixed now (in the core XF code).

Regarding your second comment, by morphological forms, I assume you're referring to stemming. In which case, that's not really a bug in that we don't try to do it; the highlighting is a bonus for literal matches only. Similarly, it doesn't work if you use a wildcard.