Wildcard search with leading "*" fails

smallwheels

Well-known member
Affected version
2.2.17, 2.3
May it is once more just me being stupid by I do have an issue with search. When using a wildcard like * it works flawlessly as long as it is at the end of the word. If it is at the beginning of the word there are no results. I am using the standard search on my forum (still on 2.2.17), however, this is also the case on the forums here that should run an actual version (and also Elastic search).

If i.e. I want to find postings that contain the word "fail2ban" I would expect a search for fail2ban, fail2* and *2ban to display equal results. All should find and return, among others, this recent post:


While the first two both return 9 pages of result including said post within the top three posts on the first page the third query "*2ban" returns 0 results.

Is this due to me being stupid or is it a bug?
 
Tl;Dr
* can only be used to expand a prefix, this is just like ES simple_query_string works - it's not like a wildcard commonly works for glob purposes

I am not sure if that is hitting the issue:

  • for one, as can be seen in the example above a * at the end of the string works as I had assumed as a wildcard for the query string, delivering the expected results. So it is obviously not limited to prefixes.
  • secondly the issue I had was not with * as a wildcard at the end of a query but in the front

However, the link you posted to the elasticsearch doc also says:

analyze_wildcard(Optional, Boolean) If true, the query attempts to analyze wildcard terms in the query string. Defaults to false. Note that, in case of true, only queries that end with a * are fully analyzed. Queries that start with * or have it in the middle are only normalized.

Which would indicate that a front leading wildcard would never be used - exactly what I experienced. On the other hand the exact same documentation says:

allow_leading_wildcard(Optional, Boolean) If true, the wildcard characters * and ? are allowed as the first character of the query string. Defaults to true.

Which indicates that a leading * as a wildcard would as a default be usable and working - which does not seem to be the case.

So what's true for XenForo? Obviously, in ElasticSearch it is a configuration parameter, I do however use the standard search on my forum and it does show the same behaviour.

Is this intentionally or is it broken? Having a working wildcard at the end of a string but not in front of a string seems an odd behaviour.
 
On the other hand the exact same documentation says:
allow_leading_wildcard(Optional, Boolean) If true, the wildcard characters * and ? are allowed as the first character of the query string. Defaults to true.
This is for query_string, not simple_query_string which is used by XFES; simple_query_string does not support leading or middle wildcards.

Even if allow_leading_wildcard was supported & enabled for simple_query_string it's strongly recommended not be used anyway:

Warning

Allowing a wildcard at the beginning of a word (eg "*ing") is particularly heavy, because all terms in the index need to be examined, just in case they match. Leading wildcards can be disabled by setting allow_leading_wildcard to false.
 
This is for query_string, not simple_query_string which is used by XFES; simple_query_string does not support leading or middle wildcards.
Ah, thank you! Didn't know that. So this basically means it is not possible in XF to sucessfully search with a leading wildcard at all?

That would be a really bad surprise. In German, we do have loads of longish words where this kind of search comes in very handy (not to say it is essential). I.e. we do have Motoboot, Segelboot, Ruderboot, Paddelboot and a shipload more, all being boats of different kinds. If I wanted to search for every boat possible I would use *boot as my query string. But in XF it is impossible to perform a successfull search like that? :oops: Wow! :eek: This implementation of search leaves quite something to be desired!

And as it seems not to be mentioned anywhere there is no way to be aware of this limitation in beforehand. The behaviour seems to have changed: On my search before posting the issue I stumbled upon a request dating from 2015 that complained about full text search using front and end wildcards as defaults:


A change much to the much worse since then. Haven't seen this kind of unhealthy search implementation anywhere else - at least not that I would be aware of.

And the question is still open why a * at the end of a query string finds successfully things as supposed whereas according to the documentation this should not be the case.

XF search isn't great anyway - now I've learned that it is way worse than I assumed. In my eyes to a level that is - in all honesty - unacceptable. And that is even true for the paid advanced search. What a bummer!
 
I think leading * queries would be alot more expensive than trailing *

There certainly are bigger search problems than this.
(1) Xenforo doesn't log searches. (an indication they don't see it as really important)
(2) The results just don't make sense.
Here's is my XenPorta search
1757687088500.webp

Xenporta search from the Resources Tab, manually restricting to just resources.

1757687187567.webp

In the history of Xenforo searches, there is a 0.001% chance someone wanted the turkish translation first.


This topic has come up many times, especially early on.


https://xenforo.com/community/searc...=wildcard&c[users]=Digital+Doctor&o=relevance (click here and then click search).
 
Last edited:
I think leading * queries would be alot more expensive than trailing *
I get that, that's what the documentation says. But would it not be a better solution to make it customizable, so that forum owners can decide if they are willing to pay the price rather than to render the search useless for a relevant search pattern - and not even to make clear, that this wildcard is not served but just to display "0 results"? So to fake to having performed a search, just unsucessful? I am really baffled - and know now where many of my failed search attempts come from.
There certainly are bigger search problems than this.
Probably. The relevance of the issue highly depends from the language of the forum and in Germany, having Words like Donaudampfschifffahrtskapitänsmützenbandbefestigungsknopffarbe it can be pretty relevant, believe me.

This topic has come up many times, especially early on.
So it wasn't in there initially, then was there in 2015 and now it is gone again.

In the history of Xenforo searches, there is a 0.001% chance someone wanted the turkish translation first.
I've read your post before. Indeed, the weight of recommendations in Enhanced Search has "a bit of a character" - or in other words: it is totally useless. The search is clearly one of the areas where XF shines on paper and disappoints dramatically in reality.
 
So this basically means it is not possible in XF to sucessfully search with a leading wildcard at all?
Not with stock XFES, yes.

That would be a really bad surprise. In German, we do have loads of longish words where this kind of search comes in very handy (not to say it is essential). I.e. we do have Motoboot, Segelboot, Ruderboot, Paddelboot and a shipload more, all being boats of different kinds.
Yeah, german is famous for compound words - AFAIK no other language uses it as extensively.
For proper search on german content you really need decompounding.

But in XF it is impossible to perform a successfull search like that?
Correct.

And as it seems not to be mentioned anywhere there is no way to be aware of this limitation in beforehand. The behaviour seems to have changed: On my search before posting the issue I stumbled upon a request dating from 2015 that complained about full text search using front and end wildcards as defaults
The code is neither MySQL FULLTEXT search nor Elasticsearch - it's a plain LIKE query that doesn't use any indexes.
This (adapted to Xenforo 2 code) would still work today, but expect query runtimes of several minutes if you do that on an a post table with a few million entries so this is obviously not really an option.

And the question is still open why a * at the end of a query string finds successfully things as supposed whereas according to the documentation this should not be the case.
Why shouldn't it?
XenForo uses simple_query_string which by default does support * to expand terms, this can be turned off by by setting flags to any combination of
  • AND
  • ESCAPE
  • FUZZY
  • NEAR
  • NONE
  • NOT
  • OR
  • PHRASE
  • PRECEDENCE
  • SLOP
  • WHITESPACE
eg. a combination that does not contain PREFIX

analyze_wildcard is off by default, but this does not affect term expansion via * - it just means that the term itself is not analyzed (stopword removal, stemming, ascii folding).

XF search isn't great anyway - now I've learned that it is way worse than I assumed.
XF search (and especially XFES) certainly can be improved in various parts, see https://xenforo.com/community/threads/more-intuitive-search-advanced-search.224418/post-1741844

But I also think people might need to rethink their expecations:
Elasticsearch is not Google and XenForo (currently) uses only a small fraction of ES capabilities.
So thinking a forum search might be capable of producing results similar to Google is IMHO somewhat ridicolous.

To be honest, so far I haven't seen a forum platform that has a (much) better search - unfortunately.

If you don't mind spending (and probably wasting) some money you could give either Manticore or Meiliesearch (or both) a try:

Both don't list decompounding as a feature, so don't expect this to be usable on those search implementations either.

We've used Manticore in the past (and still use it for older vBulletin 4 forums), from my experience it's not too bad.

and not even to make clear, that this wildcard is not served but just to display "0 results"? So to fake to having performed a search, just unsucessful?
It's not fake, XenForo actually runs a search on ElasticSearch (via simple_query_string) but this does not return results, so from a technical point of view it is correct so show "0 results" as there are ... well ... 0 results.
Though it probably shouldn't be too difficult so show warning that there might be no results to to incorrect use of *

You may also get 0 results if stopwords are used:
If stopwards are enabled with XFES and you perform a multi term search with a stopword you won't get any results from ES either.
This behaviour is different from MySQL FULLTEXT, MySQL just ignores stopwords.
While it is somewhat difficult for XFES to detect this, XFES could at least issue warning when using custom stopwords (but it also doesn't do this).
 
Last edited:
Back
Top Bottom