Not a bug ElasticSearch v5 - _adjustFilteredDslForEsVersion incorrect handling of SHOULD

Xon

Well-known member
In _adjustFilteredDslForEsVersion, it mutates the DSL structure into something valid for ElasticSearch v5 using the bool query.

This results in the ORs being turned into bool.should entries. The default minimum_should_match value is 0, IFF there are no MUST terms, otherwise it has a value of 1 (ref).

This leads to the following unexpected behaviour where searching for thread/posts returns non-thread/non-post content.

Stock output from _adjustFilteredDslForEsVersion, for searching posts/threads for "test";
Code:
{
    "query": {
        "bool": {
            "must": [
                {
                    "query_string": {
                        "query": "test",
                        "fields": [
                            "title^3",
                            "message"
                        ],
                        "default_operator": "and"
                    }
                }
            ],
            "should": [
                {
                    "type": {
                        "value": "post"
                    }
                },
                {
                    "type": {
                        "value": "thread"
                    }
                }
            ]
        }
    }
}

If the MUST query matches (ie 'test' in the text), then the SHOULD query is never checked, and content from a non-desired content type can be returned.

This did not happen pre-XF5, as it was effectively "term1 AND term2 AND (term 3 OR term4)" not "term1 AND term2 OR (term 3 OR term4)"

This is particularly noticeable if you have custom search content (ie report content which is generally duplicate of posts).
 
Last edited:
Unfortunately, I haven't actually managed to reproduce this. I can't get the query structure given in the example. If I go to search/?type=post and just type in "test", it generates this query:

Code:
{
    "query": {
        "bool": {
            "filter": [
                {
                    "bool": {
                        "should": [
                            {
                                "type": {
                                    "value": "post"
                                }
                            },
                            {
                                "type": {
                                    "value": "thread"
                                }
                            }
                        ]
                    }
                }
            ],
            "must": {
                "query_string": {
                    "query": "test",
                    "fields": [
                        "title^3",
                        "message"
                    ],
                    "default_operator": "and"
                }
            }
        }
    }
}

The difference here is that the "should" element is in a separate bool query (filter), so it has the minimum should match value of 1.

Looking at the way we process constraints, "should" filters should always get their own bool filter:
Code:
$dsl['query']['filtered']['filter']['and'][] = array(
   'bool' => array('should' => $ors)
);

It looks like "should" is only actually actually use for content type limits, so I don't think this should apply to other constraints.

Can you reconfirm the query that's being generated for you (with the stock code)?
 
Back
Top Bottom