Sphinx Search Engine

What is nice about Sphinx and ES is that you can change certain things, such as the stopwords list and minimum word length. Normally those are determined by MySQL and, if you are on a shared hosting account, those are parameters you don't ordinarily have access to.

I am looking at Sphinx now as a way to index other sites of mine that might grow larger. I'm working on two WordPress projects that could grow a bit (as both are "network" versions with multiple WP sites). Having Sphinx could really enhance searching in those sites.

Getting the data indexed and out of the XF tables is not hard at all. It's processing those indexes that is going to take me a bit of work (and what I'd need help doing). If I don't get any response here, I may see about getting paid help.
That's what it all comes down too. Sometimes when we really need something done we gotta pay for it.
 
That's what it all comes down too. Sometimes when we really need something done we gotta pay for it.

I know. We are not exactly rolling in cash, but this is the only thing holding us up right now. I don't think it would be too time consuming for someone to modify this enough that it will work--someone who knows what they are doing could probably knock this off in two or three hours.

I know the one major shortcoming is that the indexes are not updated in real time, but we were delta-indexing our vB forum every two minutes and nobody ever noticed. If I really wanted to go all out and learn this, I'd rewrite it to take advantage of the real time updating that Sphinx now offers. That is way more involved though.

Maybe I will ask Jake. I heard that he probably knows the answer. :D
 
Excellent work, Rudy.

I'm glad this hasn't completely died. We've managed to get enough hardware together to handle ES, but I think Sphinx is worth pursuing.

My understanding is that later versions of Sphinx should be able to handle the instant updates that ES enjoys, but I've not really look at it since ES became available.

Good luck with your efforts. Life is mad busy at the moment, but if I get a chance to help out, I'll do what I can.
 
BTW, anyone know if Sphinx has a stemmer built in? From what I'm reading it does by default, but only for English and Russian. Which of course would be fine for our forum. I'd seen other versions with libstemmer compiled in (including Win32/Win64 binaries), or you could compile it with libstemmer on other platforms...apparently to get support for other languages.

Stemming would be a neat thing to have. I haven't checked to see if it does this with the default Sphinx setup. If it has English stemmer support, maybe there is a way it can be enabled. It shouldn't result in any more coding work (other than a configuration change), but it would improve search results. I built stemmer support into an online clothing catalog I developed for a client.
 
I know. We are not exactly rolling in cash, but this is the only thing holding us up right now. I don't think it would be too time consuming for someone to modify this enough that it will work--someone who knows what they are doing could probably knock this off in two or three hours.

I know the one major shortcoming is that the indexes are not updated in real time, but we were delta-indexing our vB forum every two minutes and nobody ever noticed. If I really wanted to go all out and learn this, I'd rewrite it to take advantage of the real time updating that Sphinx now offers. That is way more involved though.

Maybe I will ask Jake. I heard that he probably knows the answer. :D
Why not ask Floren?
 
Why not ask Floren?

Look at Jake's user title... ;)

What's a stemmer?

You use it to find and search for the roots of words. Search for the word "walked." The stemmer determines the root of that word is "walk," then searches for "walk" and all of its basic variations on the root word (walked, walking, walker, walks, etc.). In my client's clothing catalog, she sells flood pants. Customers also call them "floods". We did not want to frustrate customers by not returning results for the alternate terms. How would customers even know if we had pants listed as "flood pants" or "floods"? The stemmer breaks it down to the root word "flood" and searches for all the variations.

For her catalog, I had also wanted to add a soundex search, since many Internet users do not pride themselves on good spelling. :D A soundex or similar function (PHP has a few choices) breaks words down into what they sound like phonetically, and stores a calculated value based on that. So you could spell "clock" as "klock" and it would compute to the same soundex value. This is a variation on what Google does when they offer you "Did you mean..." search results. For her catalog, having only several hundred products, we easily could store the keywords as soundex values in a separate column in a table. And then, search on those values in addition to the standard searches, giving the soundex hits a lower relevance. Soundex searches in a forum would be a server killer though, given the sheer volume of data. Metaphone is a more accurate form of phonetic representations of words, and there are other methods to determine similar words (Levenshtein distance, etc.).
 
MySQL 5.6 now supports full InnoDB text search.

Sphinx Search has not yet caught up to MySQL 5.6 deployment. Once it has; would you mind mlx if someone used your base to build up from it (if you have no plans to continue this)?
 
I think interest in Sphinx is pretty much dead here--I sent so far as asking (OK, maybe begging) someone to maybe pitch in and help me work on it, or even do the work and have us pay for it, but nobody stepped up to the plate. I have zero time right now to devote to doing anything forum-related.

Sorry, but the more I have seen and used ElasticSearch since our forum changeover, the more I dislike it. I think the coding is crap. And running on top of Java? Really? This is 2013, not 2002. Let's add even more bloat and resource waste by running on top of a crappy interpreter. (Rolls eyes.)

Given ES's poor performance, I can't wait any longer. Sphinx blows it doors off in performance--it served up search results literally instantly. It never crashed. And now that it support instant index updates, there is zero reason it can't be used. Even when I ran the Sphinx indexer once per minute, the members never noticed the lag, and the hit on resources was unnoticeable. In fact, I prefer that--if Sphinx ever crashed, the next update after restart would catch the search index up where it needs to be. When (not if...when...and when is often) ES crashes, you then have to go and rebuild the indexes, and it takes forever.
 
We rarely have problems with ES. So there is no urgent need for a change. And it has advantages. Easy clustering for example. However, I would also prefer Sphinx if it would be available.

The other thing is: XenForo search is rather useless if you really want to find something. I don't know how the results are handled and weighted but that handling is extra poor. It is MUCH easier to find what you're looking for if you use Google and site:xenforo.com
 
It's cool if it works for others and they are happy with it. But in our case, ES is a trainwreck--I want that off of my server as soon as I can get a Sphinx add-on either recoded by myself or made available as a paid add-on. I also use the example of Craigslist: if Sphinx is good enough for them with billions of listings, it's just fine for us. Used it for a few years without ever having to touch it (beyond initial setup and tweaking).

Google has always had better search results. You have to consider the billions of dollars and millions of labor hours put into refining their search algorithms--they've fine tuned it far beyond what we have available to us as server admins. The fact that Google can support phrase searching while returning usable results makes it worthwhile.

Want to know what I'd really like? A search service offered by Google. Even if it cost us a small amount per month. What it would buy us is the ability to instantly submit our forum posts/threads/status updates, etc. to Google, rather than wait for them to be indexed, and an API would allow us to pull the results out of our index right into the forum. Permissions would still be handled by the forum of course--these search "instances" (if you will) would be private, not public. We would also have access to tuning aspects of the search index, and whether or not we wanted to enable different types of searches (such as stemmers, Soundex/Metaphone searches, etc.). Demand for this would be so little, though, that I doubt anyone would use it.
 
We rarely have problems with ES. So there is no urgent need for a change. And it has advantages. Easy clustering for example. However, I would also prefer Sphinx if it would be available.

The other thing is: XenForo search is rather useless if you really want to find something. I don't know how the results are handled and weighted but that handling is extra poor. It is MUCH easier to find what you're looking for if you use Google and site:xenforo.com

Heh, I generaly find anything I want using the search here first time around.
 
Nobody even bothered to contact me. I think many of the good developers have pretty much left XF due to the lawsuit uncertainty...a few I use currently have already been abandoned. *sigh*
All of the good ones that I know are busy with either major projects, or with making a living off the meager sales/customs that they can off XF lately.

You might want to contact some of the developers directly, and I can pass word on to a friend who might be interested (Though I'm not sure he'll have the time for a while).
 
Sorry, but the more I have seen and used ElasticSearch since our forum changeover, the more I dislike it. I think the coding is crap. And running on top of Java? Really? This is 2013, not 2002. Let's add even more bloat and resource waste by running on top of a crappy interpreter. (Rolls eyes.)

Yes, we have 2013, not 1999. Java isn't so bad anymore. Many Searchengines use this. Wake up!

My Search runs good all the time.
 
I don't think I've read about ES crashing ever ?
Make a thread about ES crashing to see if patterns emerge.

Yeah, it crashes. I have monit and a cron to keep it alive.

But Sphinx wasn't foolproof, either. Mine would go belly up about once every six months when it tried to reindex. It would just get stuck on the old index and ignore the deltas. I'd have to log in, delete the indices, and rebuild.
 
Top Bottom