Adding a new language (polish), few other questions.

janslu

Active member
1. I am using Enhanced Search on a 13.000.000 post forum in polish and it is working great. But I started looking into using polish stemming, expecting even better search results and accuracy. There is a plugin called Stempel - it seems to integrate polish language rules coming from Lucene and is easy to install in ElasticSearch. But the list of language stems in XenForo ES seems to be hardcoded. What should I do to add a new option over there? Anyone's done this before? If I understand correctly I should also reindex the site afterwards?

2. Elasticsearch seems to use a lot of memory - top shows 18Gb of virtual memory usage. I know it mostly sits unused (part of it goes to swap) but even for a 6.3GB index it seems to be a lot. Are there any options I should look into?

3. What is Optimize mappings button doing? It doesn't seem to do anything on my forum...
 
But the list of language stems in XenForo ES seems to be hardcoded. What should I do to add a new option over there? Anyone's done this before? If I understand correctly I should also reindex the site afterwards?
This isn't just a matter of changing the language listed in XF. You would need to use a totally different analyzer to what is set within XF itself. XF only knows about the Snowball analyzer and the languages it has: https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-snowball-analyzer.html

You'd need to configure this within ES via the command line.

2. Elasticsearch seems to use a lot of memory - top shows 18Gb of virtual memory usage. I know it mostly sits unused (part of it goes to swap) but even for a 6.3GB index it seems to be a lot. Are there any options I should look into?

3. What is Optimize mappings button doing? It doesn't seem to do anything on my forum...
Well optimizing the mappings will help reduce memory usage. We've generally seen around 300MB per million posts for the index size, which should help memory usage. The option generally should not appear if it's not needed. Make sure you're running the latest XFES if you're not.

Determining true memory usage can be difficult because of things like mmap. Is the memory usage causing you problems elsewhere? Of course, if you have a fast disk (SSD), you could consider reducing the amount of memory given to ES/java and just rely on fast SSD access. Generally though you do want to fit your data in memory if possible (both in ES and MySQL).
 
This isn't just a matter of changing the language listed in XF. You would need to use a totally different analyzer to what is set within XF itself. XF only knows about the Snowball analyzer and the languages it has: https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-snowball-analyzer.html
You'd need to configure this within ES via the command line.

Oh my... This is much more complicated than I thought. I was happy to find polish stemmer but it seems it was created before the Snowball and is considered as "the stemmer" for polish. I have it installed into elastic search and I will try to look into actually using this for search.

Well optimizing the mappings will help reduce memory usage. We've generally seen around 300MB per million posts for the index size, which should help memory usage. The option generally should not appear if it's not needed. Make sure you're running the latest XFES if you're not.

Determining true memory usage can be difficult because of things like mmap. Is the memory usage causing you problems elsewhere? Of course, if you have a fast disk (SSD), you could consider reducing the amount of memory given to ES/java and just rely on fast SSD access. Generally though you do want to fit your data in memory if possible (both in ES and MySQL).

i am running the latest XFES but i'm also using digitalpoint add-on. All in all Optimizing mappings doesn't seem to be doing anything. I will try to play with it later on.
As for the memory - I am using a single server for mysql and elasticsearch and I am getting closer and closer to physical memory limits. I was hoping to find a setting that would free some of the ES memory. I will dig into optimizing mappings and I'll see where it leads me.

Thanks for support.
 
Top Bottom