Fixed Cyrillic keywords highlighting

AlexDS

Member
Elastic search works correctly in the forums, where Xenforo enhanced search plug-in is installed, only with search queries, typed in Latin. If a query is typed in Cyrillic, the search is still true, but the search results of keywords containing uppercase letters, are not highlighted. That is, for the Cyrillic keywords highlighting is case sensitive. Apparently, it is a bug of plug-in above mentioned which needs to be fixed.

2016-03-26_14-30-14.webp
 
Last edited:
+1
On my forum with cyrillic content there is same problem. But this is not ElasticSearch bug.
This is bug of highlightSearchTerm function in library/XenForo/Helper/String.php - regular expression works only with latin symbols.

Unofficial fix:

In this function code:
PHP:
return preg_replace('/(' . preg_replace('#\s+#', '|', preg_quote(htmlspecialchars($term), '/')) . ')/si', '<em class="' . $emClass . '">\1</em>', htmlspecialchars($string));

must be:
PHP:
return preg_replace('/(' . preg_replace('#\s+#', '|', preg_quote(htmlspecialchars($term), '/')) . ')/siu', '<em class="' . $emClass . '">\1</em>', htmlspecialchars($string));
 
Last edited:
Аnother bug has been found in the XenForo Enhanced Search plugin concerning improper handling of morphological word forms in the Cyrillic. If you install the original elastic search in any site, all word forms found in the search results are highlighted correctly. Here is the link for the implementation of this function in Elastic search. https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-highlighting.html)

However, the XenForo Enhanced Search plugin highlights not all word forms found, but only those that are strictly match search query.

2016-03-27_10-20-33.webp 2016-03-27_10-20-33.webp

That is, the highlighting of morphological forms for Cyrillic in the plugin does not work.
 
This has been fixed now (in the core XF code).

Regarding your second comment, by morphological forms, I assume you're referring to stemming. In which case, that's not really a bug in that we don't try to do it; the highlighting is a bonus for literal matches only. Similarly, it doesn't work if you use a wildcard.
 
Back
Top Bottom