Curl query to return most indexed words from my forum in ElasticSearch?

Sim

Well-known member
I'm using XFES and I'd like to retrieve some stats of the most frequently indexed words (ignoring common words) from my forum.

Can someone suggest a curl query I can use for ElasticSearch?
 
Generally speaking, you could accomplish this using a terms aggregation. However, content messages are indexed as text fields, and fielddata is disabled on text fields by default, so you can't use aggregations on them.

You'd have to create a separate index where you either enable fielddata for text fields (which is liable to chew through memory), or re-index message content in a separate keyword field.
 
Poking around the ES docs a bit, a significant text aggregation might be a better fit. It it designed to be used on text fields, and by its very nature should ignore common words (which the above won't do, so you're likely to get a lot of stop words). The caveat being it does require a lot of time and memory if you can't filter down the result set first, but it looks like the sampler aggregations might be good for that.

It looks as though significant text aggregations require _source to be enabled though (it's disabled in XFES indexes), so you'd still need to create a separate index for this with it enabled.
 
Last edited:
Yeah. I suppose the next best thing is you could create a small add-on extending \XFES\Service\Optimizer::getBaseMapping() to flip ['_source']['enabled']. I don't think it should cause problems, but the resulting index size will likely be a fair bit larger. Unfortunately you'd still be looking at rebuilding the entire index afterwards, which you may (understandably) be looking to avoid.
 
Back
Top Bottom