XF 1.5 ElasticSearch update query from 5.0 to 6.0

AndyB

Well-known member
I have an add-on called Similar Threads Plus which displays similar threads based on thread title matches.

The following query works fine in ElasticSearch 5.0:

PHP:
$data_string = '{
    "from" : 0, "size" : "' . $maximumResults . '",
    "query" : {
        "bool" : {
            "must" : {
                "match" : {
                    "title" : {
                        "query" : "' . $searchWord1 . ' ' . $searchWord2 .  ' ' . $searchWord3 . '",
                        "operator" : "and"
                    }
                }
            },
            "filter" : {
                "bool" : {
                    "must" : [
                    {
                        "term" : {
                            "node" : "' . $currentNodeId . '"
                        }
                    }
                    ],
                    "must_not" : {
                        "term" : {
                            "discussion_id" : "' . $currentThreadId . '"
                        }
                    }                                       
                }
            }
        }
    },
    "sort": {
        "date": {"order": "desc" }
    }                       
}';

The results are 5 thread IDs.

The query sort of works in ElasticSearch 6.0, but instead of getting 5 threads that match, I'm now getting a mixture of threads and posts. The results array looks like this:

PHP:
Array
(
    [0] => post-2153581
    [1] => thread-157934
    [2] => thread-157860
    [3] => post-2152604
    [4] => thread-157727
)

How should I change the query so that only threads are searched?
 
The portion of my add-on code that I needed to change was this:

5.0
PHP:
$ch = curl_init($esHost . ':' . $esPort . '/' . $indexName . '/thread/_search?pretty=true');

6.0
PHP:
$ch = curl_init($esHost . ':' . $esPort . '/' . $indexName . '/xf/_search?pretty=true');

What I need to figure out is how to force the query to select just thread.
 
@AndyB instead of manually building the json blob like that; I strongly recommend using json_encode on a php array. Additionally, when querying ElasticSearch as an API, you don't need "pretty=true"

Something like:
PHP:
$dsl = 
array(
    "from" => 0, "size" => $maximumResults,
    "query" => array(
        "bool" => array(
            "must" => array(
                "match" => array(
                    "title" => array(
                        "query" => $searchWord1 . ' ' . $searchWord2 .  ' ' . $searchWord3,
                        "operator" => "and"
                    )
                )
            ),
            "filter" => array(
                "bool" => array(
                    "must" => [
                    array(
                        "term" => array(
                            "node" => $currentNodeId
                        )
                    )
                    ],
                    "must_not" => array(
                        "term" => array(
                            "discussion_id" => $currentThreadId
                        )
                    )                                      
                )
            )
        )
    ),
    "sort" => array(
        "date"=> array("order"=> "desc" )
    )
);
$json = json_encode($dsl);

Note; you should use the XenEs API as it will determine the index type (ie if it is single type or not). You will need an additional 'must' clause which limits your query to threads. Changing the key from "_type" (ES 5.x support) to "type" (single type index support) depending on the run-time configuration of ElasticSearch.
 
Got it.

Here's the query which works with ElasticSearch 6.0.

PHP:
$data_string = '{
	"from" : 0, "size" : "' . $maximumResults . '",
	"query" : {
		"bool" : {
			"must" : {
				"match" : {
					"title" : {
						"query" : "' . $searchWord1 . ' ' . $searchWord2 .  ' ' . $searchWord3 . '",
						"operator" : "and"
					}
				}
			},
			"filter" : {
				"bool" : {
					"must" : [
					{
						"term" : {
							"node" : "' . $currentNodeId . '"
						}
					},
					{
						"term" : {
							"type" : "thread"
						}
					}
					],
					"must_not" : {
						"term" : {
							"discussion_id" : "' . $currentThreadId . '"
						}
					}										
				}
			}
		}
	},
	"sort": { 
		"date": {"order": "desc" }
	}						
}';
 
@AndyB how you are constructing the json request to be sent to ElasticSearch is insecure.

ElasticSearch has the ability to execute arbitrary scripts via dynamic script as defined in the json payload. This has historically been insecure as they have gone through ~2 different scripting languages trying to find a secure one, and the one they are currently using hasn't been around for long.

XFES 2.0 still supports versions of ElasticSearch with these vulnerable scripting setups, especially if all dynamic scripting is allowed.

Blind string concatenation of payload and user data is rarely a good idea. Either with SQL, JSON, other formats.
 
Top Bottom