XF 1.1 Tell me more about ElasticSearch

Wildcat Media

Well-known member
Having searched here in the XF forum as well as other forums online (including their own site), I am finding little information about what I need to know about ElasticSearch. The questions I have so far:

1) Does ElasticSearch offer wildcard or phrase-based searches, or is that a limitation imposted by XenForo?

2) Does the ElasticSearch configuration allow for custom stopword lists and minimum search word length?

3) Does ElasticSearch return results based on relevance?

4) What type of resources does a server need? We already have a dedicated box. Sphinx runs almost effortlessly even when our vB installation bogs things down, and the index is a manage size given that we have 7.5+ million posts and 800+ online in peak hours.

5) With Sphinx, I have to run delta indexing every few minutes, then do a master rebuild each night during off-peak hours. Does ElasticSearch update immediately, or do I have to run cron on the server to generate an index?

6) Is this a case where we can disable FULLTEXT indexes on our database tables?

I'm sure I will have more. We now have approval from our staff to convert our vB3.7 to XF--everyone seems to be looking forward to something newer and better! I've been fielding questions about the search in XF to see if it improves on what we have with vB and Sphinx. (And I have found that Sphinx was returning better results than vB's own built-in search.)
 

1) Yes, ElasticSearch offers wildcards in * and ? format.... * being a chain and ? being single.
Eg.
Base: Finish this sentence
Chain: Finish this ********
Single: Finish this senten?e

2) Yes, add it to your elasticsearch.yml file something like as follows...

Code:
index :
    analysis :
        analyzer :
          default:
                type: standard
                stopwords: "word1,word2,word3,word4,word5,word6"

3) Yes

4) You would want to set aside around 8gb of available memory for that size index. If you want to try the experimental mappings you may get that down to around 5gb. http://xenforo.com/community/threads/how-to-apply-custom-mapping.31103/

5) Set it and forget it. Updates happen near instantaniously.

6) Yes
 
Why the large memory requirement? Our Sphinx search works fine on our limited setup (we're on an older dedicated server), returning results from Sphinx usually under 0.5 seconds. Our deltas usually are built within mere seconds. Or are you talking of serving the ES index from a RAM disk?

For the phrase based searches, do you usually include those in quotes? That is like the Google standard for it. If I search without quotes, I expect the search to retrieve any or all terms, but not the phrase. That is one thing I will need to explain to the masses when the site goes live.
 
Why the large memory requirement? Our Sphinx search works fine on our limited setup (we're on an older dedicated server), returning results from Sphinx usually under 0.5 seconds. Our deltas usually are built within mere seconds. Or are you talking of serving the ES index from a RAM disk?

For the phrase based searches, do you usually include those in quotes? That is like the Google standard for it. If I search without quotes, I expect the search to retrieve any or all terms, but not the phrase. That is one thing I will need to explain to the masses when the site goes live.

Nothing to do with ramdisk.

The memory requirement is simply due to how ES works... however the "requirement" is for speed. It should in theory run well on less memory, but would have to rely more on loading data from disk = more IO = slower, whereas if able to store the index in your RAM it would provide faster results.

No quotes needed, eg do a search on xenforo for: this object *** been blocked
 
My main issue is that we would never be able to afford a server that had enough memory for web server, mysqld and 8GB additional set aside for a search index. We're on only 4GB right now, but Sphinx runs blindingly fast running from disk. (I have it on the web server disk rather than the database disk.) I may see if we can upgrade to SSDs down the road when we get a server upgrade, but even there it starts raising costs beyond our budget.

It would be interesting to see how well ES runs from the disk as opposed to setting aside memory. I bet it is still far better than FULLTEXT. :)

Helpful input Slavik--thanks much!
 
My main issue is that we would never be able to afford a server that had enough memory for web server, mysqld and 8GB additional set aside for a search index. We're on only 4GB right now, but Sphinx runs blindingly fast running from disk. (I have it on the web server disk rather than the database disk.) I may see if we can upgrade to SSDs down the road when we get a server upgrade, but even there it starts raising costs beyond our budget.

It would be interesting to see how well ES runs from the disk as opposed to setting aside memory. I bet it is still far better than FULLTEXT. :)

Helpful input Slavik--thanks much!


Unfortunately with that ammount of posts and only 4gb of ram to run httpd and mysql + whatever else you have going, it isn't going to run well at all.

As walter said, look to outsource the search server elsewhere :)
 
I just don't understand why Sphinx works so phenomenally fast in our setup (even as inefficient as vB is...and even when we are at maximum loads) from the disk, yet ES apparently can't handle an index without being a memory hog. What little I've read about ES online (there is not much out there, compared to Sphinx) has not impressed me at all, even despite some of its helpful features.

All our data has to remain on our server for legal reasons, so outsourcing is not an option for us.

Are these memory requirements listed anywhere on the XF site regarding the add-on for ES? If not, they should be. Otherwise I would not have planned to move to XF in the first place. I may need to see if we can find the funds to get someone to write a Sphinx addon for XF since ES is becoming more and more unacceptable, the more I find out about it.
 
Top Bottom