XFES doesn't warn user about stop words exclusion

Kirby · Mar 10, 2024

If stop words are configured, XFES doesn't warn the user that those stop words are excluded from search.

This is a different behaviour than default MySQL fulltext search which does warn the user if a stop word from the hardcoded stopword list is included in the query.

This can confuse & frustrate users, espacially in the edge of no results being returned at all (which happens if all search terms are stop words).

At least if custom stop words are configured, XFES should warn the user about stop words that are excluded from search - though ideally it should always warn the user, no matter if predefined or custom stop words are used.

Xon · Mar 10, 2024

The better solution would be to use the quote_field_suffix option for the simple_query_string search query to support both a "near exact" analyser and "stemming/stop words" analyser, and then allow ElasticSearch to search both fields and return the best matches.

This has much better search accuracy for most users. I've implemented this in my paid ElasticSearch Essentials add-on:

X

ElasticSearch Essentials

Mar 29, 2019

Additional functionality for forums running XenForo Enhanced Search (ElasticSearch).

Jeremy P · Apr 20, 2024

This is easy enough to address for custom stop-words, but I don't think the predefined stop words are exposed via an API and maintaining our own lists seems burdensome so I'm not sure we'll be able to do much with those. The only potential solution I can think of is running the keywords through a custom analyzer with only the stop-word filter, but I'm not sure how viable that is with the current design.

I do think exposing some support for exact(ish) matches is pretty cool though.

Kirby · Apr 20, 2024

Jeremy P said:
but I don't think the predefined stop words are exposed via an API and maintaining our own lists seems burdensome so I'm not sure we'll be able to do much with those.

For built-in stop-words the following idea might work:

Run _analyze with just lowercase filter and another one with lowercase and xf_stop, diff tokens => tokens that appear in first call but not in second are stopwords

Jeremy P · Apr 20, 2024

Had edited my post with a similar idea

I think that might work.

Xon · Apr 20, 2024

Jeremy P said:
This is easy enough to address for custom stop-words, but I don't think the predefined stop words are exposed via an API and maintaining our own lists seems burdensome so I'm not sure we'll be able to do much with those. The only potential solution I can think of is running the keywords through a custom analyzer with only the stop-word filter, but I'm not sure how viable that is with the current design.

I do think exposing some support for exact(ish) matches is pretty cool though.

It is actually fairly easy to bolt on stop-word vs not-stop words vs stemming to the XF elasticsearch setup.

Just setup subfields (message/message.exact/etc) with different indexers and then on query time tell add the additional fields. If the additional fields don't exist then the query still works.

For added bonus adding per-subfield boosting (message.exact^2 just a ^<integer> modifier added to the field string!) is actually fairly trivial to bias "exact" matches over stemming when building the fields list is frankly trivial.

Note; any field boosting needs to be exposed as a tunable! I've had good experience using fields like this; title.exact^2/title^1.4

This is frankly a core part of how ElasticSearch Essentials provides better search than stock (exact match, while supporting quotes is really useful).

XFES doesn't warn user about stop words exclusion

Kirby

Well-known member

Xon

Well-known member

ElasticSearch Essentials

Jeremy P

XenForo developer

Kirby

Well-known member

Jeremy P

XenForo developer

Xon

Well-known member

We value your privacy