• This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn more.

Sphinx Search Engine

Adam Howard

Well-known member
I got tired of waiting.....

I have contacted Sphinx Search directly as a general inquire in regards to their Sponsored and Custom development.

Hopefully the reply will be favorable. And we can involve Sphinx Search in developing an add-on for XenForo.
 

Attachments

Luis

Well-known member
Adam would be you kindly to paste the text here, I think some users use a translator to read xenforo.com and at this moment the translator still does not read the images.

Thanks, Luis.
 

Rudy

Well-known member
Yeah, it crashes. I have monit and a cron to keep it alive.

But Sphinx wasn't foolproof, either. Mine would go belly up about once every six months when it tried to reindex. It would just get stuck on the old index and ignore the deltas. I'd have to log in, delete the indices, and rebuild.
We'd occasionally get those hiccups with Sphinx also--not a big deal since reindexing was way faster than rebuilding indexes using XF/ES in tandem. But like you say, maybe once or twice a year it would happen. On our old server, I could rebuild indexes in maybe 15-20 minutes, and it could use the command line. XF/ES? Well over an hour for millions of posts, and this is on our new server. And as much as Java and/or ES fails, there's no way I'd recommend using it in a production environment. The key is this: sure, it may work fine on someone else's server, but it's how it works on OUR server that matters. And on our server, ES (and Java in general) are a failure.
 

Rudy

Well-known member
BTW, one other thing I noticed in regard to speed: ES is very slow. On our private testing forum, it ran quite well using XF's built in search. Once I enabled the ES add-on, it now takes several seconds for a post to be accepted.
 

digitalpoint

Well-known member
BTW, one other thing I noticed in regard to speed: ES is very slow. On our private testing forum, it ran quite well using XF's built in search. Once I enabled the ES add-on, it now takes several seconds for a post to be accepted.
I honestly think you might have something misconfigured.

My biggest worry about my vBulletin -> XenForo migration was replacing Sphinx search with Elastic Search. I'm running more or less a stock ES setup (not doing any "warming" of indexes or anything else). And for the searches since ES was started (1,002,361 queries), the average search time is 0.0376 seconds overall. Compare this to Sphinx search, which was also fast (the average was ~0.08 seconds). This is with 21,978,245 searchable documents, and they were the same searchable content that I had on Sphinx.

I'm fairly qualified as an expert with Sphinx (I was the one that built the Sphinx search that pretty much every big board uses on vBulletin 4): https://marketplace.digitalpoint.com/sphinx-search-for-vbulletin-4.870/item

All that being said there are some things I prefer about Sphinx, and there are also some things I like better in ES vs. Sphinx. I don't like that Elastic Search takes about 50% more memory for indexes... for us Sphinx took 4.16GB, ES with the same data takes 7.64GB (more on that here).

If it takes anything more than a fraction of a second for stuff to be indexed with Elastic Search, your server most definitely has something abnormal going on.

You are welcome to try out searching stuff here to see how ES handles your queries: https://forums.digitalpoint.com/

Like I said, it's more or less an untuned/standard ES setup that doesn't even bother to warm the indexes.
 

Rudy

Well-known member
I honestly think you might have something misconfigured.
That's entirely possible. But, I'm doing the same thing you are basically--I just use a standard ES setup, allocating more than enough memory to accommodate it. To me, running ES on Java is inefficient--it's the same idea as running a second operating system in a virtual machine on your main OS. I have always disliked Java. In light of all of the recent security breaches in the browser Java VMs, I am far from confident of wanting to fire up Java on our server. Granted it is apples and oranges (or servers vs. browsers), but Java has been a perennial unwanted annoyance IMHO, much like Flash.

I've even noticed the same posting delays right here on XF's own forum.

I, too, dislike that ES is such a memory hog. Sphinx never used anywhere near as much memory on our old system. We had to provision double the memory just to accommodate where ES's index might grow to in the future. Sphinx used a fraction for the same data set. In fact, didn't Sphinx use the disk index primarily? I don't think I ever saw vB report a search result that took any longer than 0.4 seconds (which is far shorter than vB could even render the search results page).

Did you also have a hand in developing that Sphinx plugin which we used on vB 3.7/3.8? If so, that was a huge lifesaver for us. :) (If anything, it spoiled us as we'd grown so used to Sphinx. :D )

I'm doing as little as possible with Java and/or ES at this point--I just simply have no time to devote to it. I'm all for seeing a Sphinx add-on. If nothing happens here by then, I may look into adapting the existing ES add-on starting in mid May when I am not under such a tight schedule.

I do understand why XF may want to offer ES vs. Sphinx, as in most cases Sphinx needs to be compiled on the server it is being run on, and many don't have the ability to do that. But any server admin running a forum large enough to need a dedicated server should have that skill set IMHO.
 

digitalpoint

Well-known member
No... I didn't have anything to do with Sphinx on vB3... just vB4.

I'm not in disagreement with you... there are basically 2 things I don't like about Elastic Search. The memory requirements (but it's certainly not double or exponentially higher than Sphinx... but less is always better with memory as far as I'm concerned). My bigger issue is I don't particularly care for it running on Java either. But bottom line is for what it's doing (returning search results), it's about twice as fast as Sphinx at doing that (which still seems super strange to me with how that can even be... but it is for whatever reason).

I was actually figuring that once I was live on XenForo, I'd be making my own Sphinx-based search... but truthfully, I just haven't seen a need yet. As for the couple things I like better about Sphinx, there's just as many (if not more) things I ended up liking better in Elastic Search...

  • Schema-less makes it so when you add new content types for search (for example resource manager), you don't need to define specific search indexes like you do in Sphinx
  • Elastic Search is faster at returning search results (again, I still find this strange, but for me it's been the case)
  • Sphinx is weird with how it internally handles document IDs, so you have to pre-allocate certain blocks of document IDs for each content type
  • Sharding/clustering multiple nodes is nothing short of amazing in Elastic Search

Coming from ME (the guy that made the Sphinx search for vB4), it's saying something... especially because I was so skeptical of using Elastic Search after being on Sphinx for years.

That being said... the new versions of Sphinx are looking pretty nice. 2.1 has support for schema-less indexes: http://sphinxsearch.com/blog/2013/02/07/sphinx-2-1-json-attributes/

If they work out node discovery/auto-sharding/clustering to work more like it does in Elastic Search, I might consider making a Sphinx search for XenForo. But as it is right now, I see the current version of Sphinx more or less on par with Elastic Search overall.
 

Adam Howard

Well-known member
I got a reply....

Seems as though they maybe interested in the idea :)

View attachment 40316
Adam would be you kindly to paste the text here, I think some users use a translator to read xenforo.com and at this moment the translator still does not read the images.

Thanks, Luis.
QUOTE:

Hello Adam, Thank you for writing to Sphinx and for spearheading this custom development project. I am not one of the development minds around so I am unable to give you an accurate hours estimate off hand but I will forward to our team and get some notes together. I am sure that we will have a few technical questions to shoot your in order to get the right proposal together. Often times, the clients sponsoring a build have a set budget in place. Is this the case with your group? I am also curious about what peaked your interest in sponsoring this build. We look forward to working with your team and putting together this add-on.