Steffen
Well-known member
- Affected version
- 2.0.10
The search query tokenizer of XenForo turns the query "term1 - term2" into "term1 -term2" (it removes the whitespace in front of "term2" and therefore negates it). I think this is not intuitive and doesn't match the behaviour of Elasticsearch's "simple_query_string" tokenzier either.
This can be fixed as follows:
I'm not 100% sure whether this could have unintended side-effects. Maybe whitespace should be allowed after "|" but not after "+" or "-".
PS: When using Elasticsearch, shouldn't XenForo just pass the raw query string to Elasticsearch and let its "simple_query_string" feature handle the query tokenization?
This can be fixed as follows:
Diff:
--- a/src/XF/Search/Source/AbstractSource.php
+++ b/src/XF/Search/Source/AbstractSource.php
@@ -151,7 +151,6 @@ abstract class AbstractSource
preg_match_all('/
(?<=[' . $splitRange .'\-\+\|]|^)
(?P<modifier>\-|\+|\||)
- [' . $splitRange .']*
(?P<term>"(?P<quoteTerm>[^"]+)"|[^' . $splitRange .'\-\+\|]+)
/ix', $keywords, $matches, PREG_SET_ORDER);
I'm not 100% sure whether this could have unintended side-effects. Maybe whitespace should be allowed after "|" but not after "+" or "-".
PS: When using Elasticsearch, shouldn't XenForo just pass the raw query string to Elasticsearch and let its "simple_query_string" feature handle the query tokenization?
Last edited: