Fixed Unlimited keywords in search can cause issues with search back-ends

Xon

Well-known member
Affected version
2.2.5
If, for whatever reason, a user copes a very long post into the search bar; XenForo will generally return an error instead of telling the user has too many search terms.

This is most noticeable with XFES, as by default it will error with;
SQL:
XFES\Elasticsearch\RequestException: Elasticsearch error: query_shard_exception, too_many_clauses: maxClauseCount is set to 1024 src/addons/XFES/Elasticsearch/Api.php:410

When XF\Search\Source\AbstractSource::parseKeywords parses the keyword list; it really should cause a user visible error message if too many keywords are extracted. Probably should be a function vs a constant to get the max keywords value.
 
I've got some code that just blanket sets the default max keyword limit to 1024 via a constant which is a reasonable default I think in all cases, but it's referenced from a method in the source class so it can be overridden on a per source basis.

That said, I'm not actually clear how we can get the real maxClauseCount / max_clause_count out from elasticsearch. Is that even possible? As best I can see it's not a runtime setting so it's not like we can just set it to 1024 and allow it to be configurable.

So the issue is unless we can get the effective limit for Elasticsearch this change isn't really going to help. If someone has a higher limit we'll block those searches, if someone has a lower limit they'll still hit the error.
 
This can actually be fetched, but requires an argument to be passed when fetching cluster settings, not index settings.


So;
Code:
curl -XGET "http://localhost:9200/_cluster/settings/?include_defaults"

But since this is Elasticsearch it may be under indices.query.bool.max_clause_count or index.query.bool.max_clause_count depending on version! ?? does make probing this mess easier.

If XFES does fetch this, probably something to cache or write to an XF option rather than fetching these settings every database query (ie save a http request + json encode/decode of a large blob)
 
Last edited:
This can actually be fetched, but requires an argument to be passed when fetching cluster settings, not index settings.
Judge Judy Reaction GIF


But since this is Elasticsearch it may be under indices.query.bool.max_clause_count or index.query.bool.max_clause_count depending on version! ?? does make probing this mess easier.
Come On Reaction GIF


If XFES does fetch this, probably something to cache or write to an XF option rather than fetching these settings every database query (ie save a http request + json encode/decode of a large blob)
Thanks. I hate it.
 
Thank you for reporting this issue, it has now been resolved. We are aiming to include any changes that have been made in a future XF release (2.2.7).

Change log:
Impose a limit on the maximum number of keywords that can be searched for (default: 1024) and allow XFES to fetch the max_clause_count configuration value where possible to avoid a shard exception.
There may be a delay before changes are rolled out to the XenForo Community.
 
Just to give everyone the gist of what the fix entails, there's a diff below.

Diff:
diff --git a/src/XF/Search/Source/AbstractSource.php b/src/XF/Search/Source/AbstractSource.php
index 1021993da28..56373c0b495 100644
--- a/src/XF/Search/Source/AbstractSource.php
+++ b/src/XF/Search/Source/AbstractSource.php
@@ -10,6 +10,8 @@
 
 abstract class AbstractSource
 {
+    const DEFAULT_MAX_KEYWORDS = 1024;
+
     protected $bulkIndexing = false;
 
     abstract public function isRelevanceSupported();
@@ -78,6 +80,11 @@ public function getWordSplitRange()
         return '\x00-\x21\x28\x29\x2C-\x2F\x3A-\x40\x5B-\x5E\x60\x7B\x7D-\x7F';
     }
 
+    public function getMaxKeywords(): int
+    {
+        return self::DEFAULT_MAX_KEYWORDS;
+    }
+
     public function parseKeywords($keywords, &$error = null, &$warning = null)
     {
         $splitRange = $this->getWordSplitRange();
@@ -159,6 +166,14 @@ public function parseKeywords($keywords, &$error = null, &$warning = null)
             )->render('raw');
         }
 
+        $keywordCount = count($output);
+        $maxKeywords = $this->getMaxKeywords();
+
+        if ($maxKeywords && $keywordCount > $maxKeywords)
+        {
+            $error = \XF::phrase('search_could_not_be_completed_because_number_of_keywords_exceeds_limit_x', ['maxKeywords' => \XF::language()->numberFormat($maxKeywords)]);
+        }
+
         return $this->finalizeParsedKeywords($output);
     }

diff --git a/src/addons/XFES/Elasticsearch/Api.php b/src/addons/XFES/Elasticsearch/Api.php
index 36383de524b..e45ccc5b1e8 100644
--- a/src/addons/XFES/Elasticsearch/Api.php
+++ b/src/addons/XFES/Elasticsearch/Api.php
@@ -310,6 +310,11 @@ public function createIndex(array $config = [])
         return $body;
     }
 
+    public function getClusterSettings(bool $includeDefaults = true)
+    {
+        return $this->requestFromRoot('get', '_cluster/settings' . ($includeDefaults ? '?include_defaults=true' : ''))->getBody();
+    }
+
     public function updateSettings(array $settings)
     {
         return $this->requestFromIndex('put', '_settings', $settings)->getBody();
@@ -333,6 +338,13 @@ public function requestFromIndex($method, $path, $data = null)
         return $this->request($method, "{$index}/{$path}", $data);
     }
 
+    public function requestFromRoot($method, $path, $data = null)
+    {
+        $path = ltrim($path, '/');
+
+        return $this->request($method, $path, $data);
+    }
+
     public function requestById($method, $type, $id, $data = null)
     {
         if ($this->isTypelessIndex())
diff --git a/src/addons/XFES/Search/Source/Elasticsearch.php b/src/addons/XFES/Search/Source/Elasticsearch.php
index bc24b04aa15..6184c95cf73 100644
--- a/src/addons/XFES/Search/Source/Elasticsearch.php
+++ b/src/addons/XFES/Search/Source/Elasticsearch.php
@@ -944,6 +944,41 @@ public function getWordSplitRange()
         return '\x00-\x21\x23-\x26\x28\x29\x2B\x2C\x2F\x3A-\x40\x5B-\x5E\x60\x7B-\x7F';
     }
 
+    public function getMaxKeywords(): int
+    {
+        return $this->getMaxClauseCount();
+    }
+
+    protected function getMaxClauseCount(): int
+    {
+        $options = \XF::options();
+        $xfesConfig = $options->xfesConfig;
+
+        if (!isset($xfesConfig['maxClauseCount'])
+            || !isset($xfesConfig['maxClauseCountChecked'])
+            || !$xfesConfig['maxClauseCountChecked']
+            || $xfesConfig['maxClauseCountChecked'] < \XF::$time - 60 * 60 * 6 // 6 hours
+        )
+        {
+            $clusterSettings = $this->es->getClusterSettings();
+
+            $maxClauseCount = $clusterSettings['defaults']['indices']['query']['bool']['max_clause_count']
+                ?? $clusterSettings['defaults']['index']['query']['bool']['max_clause_count']
+                ?? null;
+
+            if ($maxClauseCount === null)
+            {
+                $maxClauseCount = self::DEFAULT_MAX_KEYWORDS;
+            }
+
+            $xfesConfig['maxClauseCount'] = $maxClauseCount;
+            $xfesConfig['maxClauseCountChecked'] = \XF::$time;
+
+            \XF::repository('XF:Option')->updateOption('xfesConfig', $xfesConfig);
+        }
+
+        return intval($xfesConfig['maxClauseCount']);
+    }
+
     protected function finalizeParsedKeywords(array $parsed)
     {
         $query = '';
 
Top Bottom