Third party Stop Word Problem

rfc0001

Well-known member
Hi, I have several common words added as "stop words", which Enhanced search describes as words not included in the index.

I assumed these words were also ignored in searches. E.g. If I search for "in the world" and in and the are stop words, Xenforo would just search for "world".

However, I've found out that in the above example "in the world" returns 0 results, "the world" returns 0 results, and "world" returns several results.

What I assume is happening is XF ES is actually searching all three terms including stop words, and since 2 of them are stop words (therefore not indexed) XF is getting 0 results for those. I think this behavior is a bug. I would expect XF to exclude stop words from both the index and when performing searches. As it is, all stop words do is literally stop the search and return 0 results, which I don't think is very useful or the expected behavior.
 
After you updated the stop words, did you rebuild the index? If you don't mind, can you try restarting Elasticsearch?

Stop words should be handled entirely internally within Elasticsearch, so it handles ignoring them when doing analysis, both for searches and indexing.

Saying that, are you doing a phrase search (with actual quotes)? If so, this may be how Elasticsearch actually works. It has to do with how they do positional data and the stop words reducing precision/information.
 
I did rebuild the index, deleting it first. I'll try restarting elastic search. I just uninstalled/reinstalled it Wednesday. I'm not doing a phrase search (no quotes).
 
After you updated the stop words, did you rebuild the index? If you don't mind, can you try restarting Elasticsearch?

Stop words should be handled entirely internally within Elasticsearch, so it handles ignoring them when doing analysis, both for searches and indexing.

Saying that, are you doing a phrase search (with actual quotes)? If so, this may be how Elasticsearch actually works. It has to do with how they do positional data and the stop words reducing precision/information.
I completely uninstalled/reinstalled Elasticsearch and Enhanced Search, rebooted the server, and rebuilt the index and still having this issue. Searches with stop words return 0 results. If I omit the stop words, the search return results. If I disable stop words, the original search returns results.
 
I've managed to confirm this now. However, testing across multiple versions, it appears to only happen in Elasticsearch 6. I'll move this to bugs, but I'll need to do more testing to see if this is something on our end or an Elasticsearch bug. There are significant changes both within XFES for Elasticsearch 6 (and thus likely large changes within Elasticsearch itself), so it certainly could be anywhere.

As it stands, the only workaround I could recommend is not using stop words for now.
 
I've managed to confirm this now. However, testing across multiple versions, it appears to only happen in Elasticsearch 6. I'll move this to bugs, but I'll need to do more testing to see if this is something on our end or an Elasticsearch bug. There are significant changes both within XFES for Elasticsearch 6 (and thus likely large changes within Elasticsearch itself), so it certainly could be anywhere.

As it stands, the only workaround I could recommend is not using stop words for now.
Thanks. My workaround is to disable stop words, which it appears XF Community has done as well.
(Assuming you are using Enhanced Search) 👍.
 
I will need to do some testing, but there is a bug report with a merged patch which is targeted for Elasticsearch 6.4.1: https://github.com/elastic/elasticsearch/issues/33009 (looks like there was at least 1 previous report too)

Assuming that does fix the issue (and it certainly fits with the issue here), then this is solely an Elasticsearch bug so upgrading to 6.4.1 when it's released would be the only resolution. (Or to not use Elasticsearch 6.x yet which is where it seems to have been introduced.)
 
Any chance of replacing Stopwords with Common Terms and performing an Elasticsearch Common Terms Query by default?

https://www.elastic.co/blog/stop-stopping-stop-words-a-look-at-common-terms-query

The Common Terms Query is exciting, since it is truly a win-win. You gain the speed of stop word removal, but maintain the precision of leaving stop words in the index. Stop words will still have their uses, but we envision the majority of queries converting to a Common query.

It is a powerful optimization that doesn’t sacrifice relevance for speed. Give it a shot with your data and see if it helps boost performance!
 
Back
Top Bottom