• This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn more.
[WMTech] How to configure Elasticsearch for non-english languages

[WMTech] How to configure Elasticsearch for non-english languages

wmtech

Well-known member
#1
wmtech submitted a new resource:

How to configure Elasticsearch for non-english languages - Optimize Elasticsearch for german, french and other languages with special letters

After diving very deeply into Elasticsearch documentation and several successful tests, I would like to share the settings with you to make Elasticsearch working perfectly for non-english language boards.

At your server, open the Elasticsearch config file, usually to find under
/elasticsearch/config/elasticsearch.yml
with your text editor.

Please the following lines at the end of this file (example for german language):...
Read more about this resource...
 

Marcus

Well-known member
#4
What about the xenforo acp settings for the elasticsearch index? Are these settings overwritten with these lines?

/elasticsearch/config/elasticsearch.yml


index.analysis.analyzer.default.type: custom
index.analysis.analyzer.default.tokenizer: standard
index.analysis.analyzer.default.filter: ["standard", "lowercase", "stop", "snow", "length" ]
index.analysis.filter.snow.type: snowball
index.analysis.filter.snow.language: German2
index.analysis.filter.length.type: length
 

wmtech

Well-known member
#5
What about the xenforo acp settings for the elasticsearch index? Are these settings overwritten with these lines?
If you change the settings for the Xenforo search index from Xenforo ACP (like switch "stemming" to on), they would override the defaults you are setting with those lines and thus disable this modification.

You are save if you DO NOT change the Elasticsearch settings in Xenforo ACP after you have added those lines to elasticsearch.yml.
 

wmtech

Well-known member
#8
It looks like the new ES release now supports these different languages directly from the ES Options in the AdminCP, correct?
With the most recent XFES add-on you can choose the language setting from the ACP.
However it does no harm to set this in the elasticsearch config also.
 

sinucello

Well-known member
#9
Hi,
thanks, this is really usefull. But the German2 umlaut stemming is not working very well. I mean it`s a good thing that if you search for "gruen" - "grün" (green) will also be found but if you search for Kuchen (cake) the result will also include Küchen (kitchen) which is not relevant.

So we have to kind of Umlaut-Words:
  • 2 variants of a word with exactly the same meaning
  • 2 words with totally different meanings
Google can handle this very well. I don`t know if this can be solved by scripting. It`d be helpful already if the word the user typed in would be weighted higher than the variant that is found because of the umlaut stemming.

I would like to hear your experiences or solutions for this problem.

all the best,
Sacha