Curl query to return most indexed words from my forum in ElasticSearch?

Sim · Feb 14, 2020

I'm using XFES and I'd like to retrieve some stats of the most frequently indexed words (ignoring common words) from my forum.

Can someone suggest a curl query I can use for ElasticSearch?

Jeremy P · Feb 14, 2020

Generally speaking, you could accomplish this using a terms aggregation. However, content messages are indexed as text fields, and fielddata is disabled on text fields by default, so you can't use aggregations on them.

You'd have to create a separate index where you either enable fielddata for text fields (which is liable to chew through memory), or re-index message content in a separate keyword field.

Jeremy P · Feb 14, 2020

Poking around the ES docs a bit, a significant text aggregation might be a better fit. It it designed to be used on text fields, and by its very nature should ignore common words (which the above won't do, so you're likely to get a lot of stop words). The caveat being it does require a lot of time and memory if you can't filter down the result set first, but it looks like the sampler aggregations might be good for that.

It looks as though significant text aggregations require _source to be enabled though (it's disabled in XFES indexes), so you'd still need to create a separate index for this with it enabled.

Sim · Feb 18, 2020

Thanks @Jeremy P - I'm not especially interested in creating new indexes and so on, so it looks like it's not something I can achieve right now.

Jeremy P · Feb 18, 2020

Yeah. I suppose the next best thing is you could create a small add-on extending \XFES\Service\Optimizer::getBaseMapping() to flip ['_source']['enabled']. I don't think it should cause problems, but the resulting index size will likely be a fair bit larger. Unfortunately you'd still be looking at rebuilding the entire index afterwards, which you may (understandably) be looking to avoid.

Curl query to return most indexed words from my forum in ElasticSearch?

Sim

Well-known member

Jeremy P

XenForo developer

Jeremy P

XenForo developer

Sim

Well-known member

Jeremy P

XenForo developer

We value your privacy