How to: Basic Elasticsearch installation. (RHEL/SUSE)

Slavik

XenForo moderator
Staff member
Basic Elasticsearch Installation (RHEL / SUSE)

@Floren has an amazing repo for RHEL 6 and 7, the Elasticsearch RPM he provides is perfectly set up and currently I suggest using it.

The old manual setup guide can be found below.

Step 1) Install the Axivo Repo: https://www.axivo.com/resources/repository-setup.1/

Step 2) Install ElasticSearch: https://www.axivo.com/resources/elasticsearch-setup.11/




This guide is provided to show how to do a basic (vanilla get up and go) install of Elasticsearch (0.90.0 Beta 1), the Elasticsearch Service Wrapper and the required Java Runtime Environment (JRE) (1.7.0_17) on RHEL / SUSE. This guide will not cover running a dedicated Elasticsearch user.

For Debian/Ubuntu users, a guide can be found here.

This guide assumes the user has basic knowledge of SSH and prior to starting the steps below has logged in as root. This guide also assumes the user does not currently have any JRE installed. You can check if you have JRE installed by typing

Code:
java -version

As of writing, the current file locations for JRE are as follows:

32 bit
Code:
http://download.oracle.com/otn-pub/java/jdk/7u17-b02/jre-7u17-linux-i586.rpm

64 bit
Code:
http://download.oracle.com/otn-pub/java/jdk/7u17-b02/jre-7u17-linux-x64.rpm

The guide will be shown using the 64 bit install, however if you are using a 32 bit system, change the file names as appropriate.

Please note, whilst this is a simple and easy setup, I take no responsibility for any damages or losses that may occur to your system by following the steps below. If you are unsure at any stage, please ask for assistance or seek the help of a qualified Linux Systems Administrator.

Installing the JRE
Type the following commands into your SSH terminal.
Code:
cd /tmp
wget http://download.oracle.com/otn-pub/java/jdk/7u17-b02/jre-7u17-linux-x64.rpm
rpm -ivh jre-7u17-linux-x64.rpm
java -version

Assuming everything was done correctly, you should get the following output.

Code:
# java -version
java version "1.7.0_17"
Java(TM) SE Runtime Environment (build 1.7.0_17)
Java HotSpot(TM) 64-Bit Server VM (build 22.0-b10, mixed mode)

Install Elasticsearch

Code:
cd /
curl -L -O -k https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-0.90.0.Beta1.zip
unzip elasticsearch-0.90.0.Beta1.zip
mv elasticsearch-0.90.0.Beta1 elasticsearch

Install the Elasticsearch Service Wrapper

Code:
curl -L -k http://github.com/elasticsearch/elasticsearch-servicewrapper/tarball/master | tar -xz
mv *servicewrapper*/service elasticsearch/bin/
elasticsearch/bin/service/elasticsearch install
ln -s `readlink -f elasticsearch/bin/service/elasticsearch` /usr/local/bin/rcelasticsearch
rcelasticsearch start

Assuming everything was done correctly, you should see the following output.

Code:
rcelasticsearch start
Starting ElasticSearch...
Waiting for ElasticSearch......
running: PID: xxxxx

Basic Configuration

You should do some basic configuration of Elasticsearch before installing the addon in XenForo.


1) Open up /elasticsearch/config/elasticsearch.yml and on line 32 edit

Code:
# cluster.name: elasticsearch

To

Code:
cluster.name: PUT-SOMETHING-UNIQUE-HERE

on line 199 edit

Code:
# network.host: 192.168.0.1

to

Code:
network.host: 127.0.0.1

On line 211 edit

Code:
# http.port: 9200

to

Code:
http.port: 9200

Save and Close


2) Open up /elasticsearch/bin/service/elasticsearch.conf on line 2 edit

Code:
set.default.ES_HEAP_SIZE=1024

To a number suitable for the size of your forum.

I reccomend approximately 1 GB for the HEAP_SIZE per 1 million posts on your forum.

1 Million Posts: 1024
2 Million Posts: 2048
3 Million Posts: 3072
4 Million Posts: 4096
etc

This will not mean the service will use all that available memory, however it will have it at its disposal if required.

So for example a 3 Million Post forum would edit

Code:
set.default.ES_HEAP_SIZE=1024

to

Code:
set.default.ES_HEAP_SIZE=3072



Save and Exit.


3) Optional - Move the Elasticsearch data directory.

Your linux install may be configured in such a way that your install partition is only a few Gb in size, and placing a large Elasticsearch index there is not ideal.

In which case you will want to move the index directory to a different, larger, location (in this example /var/elasticsearch)

Code:
cd /var
mkdir elasticsearch

Open up /elasticsearch/config/elasticsearch.yml on line 143 edit

Code:
# path.data: /path/to/data

to

Code:
path.data: /var/elasticsearch

Save and Exit

4) Restart the Elasticsearch Service

In SSH type

Code:
rcelasticsearch restart

You should get the following output

Code:
rcelasticsearch restart
Stopping ElasticSearch...
Stopped ElasticSearch.
Starting ElasticSearch...
Waiting for ElasticSearch......
running: PID: xxxxx

Elasticsearch is now runing with your updated config.



Install the XenForo Enhanced Search Addon

1) Turn your board off into maintainance mode*

2) Download the addon from your customer area at http://xenforo.com/customers/

3) Follow the instructions found at http://xenforo.com/help/enhanced-search/

4) Wait for your indexes to be rebuilt

5) Open your board.

6) Install the index pre-warmer.

As of 0.90.0 Beta an index pre-warmer is available. This keeps your search index "warm" in active memory so when a search is done, the access time latency is highly reduced.

Installing this is simple, in SSH simply run the following replacing the *INDEX NAME* with the name of your ES index.

Code:
curl -XPUT localhost:9200/*INDEX NAME*/_warmer/warmer_1 -d '{
    "query" : {
        "match_all" : {}
    }
}'

You should have the following returned

Code:
{"ok":true,"acknowledged":true}


*You may leave your board open during the re-index process.

Congratulations. Your board should now be running XenForo Enhanced Search.
 
Last edited:
This is the part a few of us are trying to work out, factors effecting index sizes and speed and attempting to work out a requirement, having said that the 1gb/million posts seems like a reasonable enough starting point and any custom mapping reducing that requirement is a bonus... but for the time being it seems more like a "suck it and see" approach is the only way to get those answers.
Thanks for the info. In other words a large forum will use memory equivalent to the size of their posts. From what I read on the Elastic group, most users require the double based on that calculation... but this is where the custom mapping gets into game I guess.
DP is releasing an addon, theres a nice pretty UI you can find on github (https://github.com/mobz/elasticsearch-head) or more simply hit this in and work out the numbers yourself, also remember memory usage is linked to user load.
There is another one also, called BigDesk. I cannot work the numbers myself unfortunately because I don't have access to any large forums running XenForo with an enhanced search. That's why I'm posting here so I get feedback from users who experienced with that. There is an interesting part in your comment: "memory usage is linked to user load." You obviously noticed something different when many users perform many searches at the same time. Can you share your findings with us?

Looks like you are right on the multiple clients usage and memory. A search on the Elastic group showed that there are severe issues with large indices when memory is low. First, the user had 34 million documents indexed into a 70GB index which will require in real life 70GB of RAM allocated if we need to benefit from proper speed on searches. He allocated 10GB and has several clients (web nodes I presume) accessing the search box.
 
There is an interesting part in your comment: "memory usage is linked to user load." You obviously noticed something different when many users perform many searches at the same time. Can you share your findings with us?

Its very hard to quantify in terms of real figures, but put simply, it would apear that once an index file has been opened for a particular search term and is in active memory any subsequent searches for that term have an incredibly low impact on the server, and as such if the broadness of what is being searched for is slim, so are the actual memory requirements.

On the contrast if the broadness of the search terms is particularly large, ES is having to load up multiple index files and this obviously has negative implications.

I guess a board like IGN has a lot of users searching for a wide variety of topics, yet on a board such as http://p8ntballer-forums.com/ the search terms are quite contrained to a handful of primary searches.
 
... once an index file has been opened for a particular search term and is in active memory any subsequent searches for that term have an incredibly low impact on the server ...
Are we talking about the scenario where index data is partially stored into memory? Because the indices would not need to be loaded into memory since they are already there.
 
Are we talking about the scenario where index data is partially stored into memory? Because the indices would not need to be loaded into memory since they are already there.

Yes, for example if you did a search on p8ntballer for "ego" the result will probably be almost instantanious, however something much more random may take a second or 2. Thats simply because the search term"ego" is one of the highest searched things on our site.
 
I don't think the waiting period is an issue. Honestly, I would not care as board owner if the users wait 2 seconds for results. The main concern is the hardware usage, I don't want to be forced to add several servers to a large forum, allocated just for searching. Based on your experience, will a forum with 50 million posts be capable to run the enhanced search without adding/modifying/improving the existing hardware they use with vBulletin and Sphinx?

To get back to my previous example, 46 millions posts use the current db server as place to store the Sphinx indices and uses 3.3GB of RAM to serve data stored into 27GB worth of indices. Even if Sphinx processes the results in 0.057 seconds, in real live the end result is consistently returned in about 0.2 seconds. No additional server resources are used, Sphinx lowers the server load and CPU usage.
 
I don't think the waiting period is an issue. Honestly, I would not care as board owner if the users wait 2 seconds for results. The main concern is the hardware usage, I don't want to be forced to add several servers to a large forum, allocated just for searching. Based on your experience, will a forum with 50 million posts be capable to run the enhanced search without adding/modifying/improving the existing hardware they use with vBulletin and Sphinx?

To get back to my previous example, 46 millions posts use the current db server as place to store the Sphinx indices and uses 3.3GB of RAM to serve data stored into 27GB worth of indices. Even if Sphinx processes the results in 0.057 seconds, in real live the end result is consistently returned in about 0.2 seconds. No additional server resources are used, Sphinx lowers the server load and CPU usage.

What do the current server specs look like?
 
For db, an Intel quad with 32GB of RAM and RAID10 15K disks. Most memory is used by MySQL.

As in myql is actually using that available memory or just has it available to it?

I would say yes. However we are talking territories now which I don't have the facility or the knowledge to advise on as I have never realy worked on anything of such a scale before.

My thoughts would be to offload the ES service to the web server (as im guessing that would be highly optimised nginx?) and I guess under less stress than the mysql one.

Realistically, I think to serve your questions some sort of test bed (maybe in collaberation between like likes of you, dp and ign) would need to be set up (10M posts, possibly a distributed cluster) and see what can be done and set that as a starting point by loading it up with seige and possibly some web script to generate random search strings to realy hammer it and find out how best to alter things.
 
My thoughts would be to offload the ES service to the web server (as im guessing that would be highly optimised nginx?) and I guess under less stress than the mysql one.
Web nodes are pretty busy, the db box has the least load/CPU usage and dual NIC's for decent data transfer.
 
Hey guys,
I recently ran into an issue trying to set up elastic search, I'd like to point out something for anybody else who is possibly having problems. I run a virtual dedicated server with 1GB of ram, it's only running a single website right now so there's not much of an issue. But it wouldn't run elastic search! I was eventually led to elastic search log files showing that the virtual machine wasn't initializating thanks to no heap memory.

So I spent a few hours trying to either A) Remove the memory limitation or B) Update OpenJDK from 1.6 to 1.7. Both answers were wrong.

The default heap size for Elastic Search is 1024MB, 1 Gigabyte of RAM. That's 100% of my servers ram. Once I went into the Elastic Search configuration file and noticed it was set to use up 1GB of memory by default, I changed the value to 256:512 (My board isn't that big) and VOILA! It worked.
 
On large server, we can face with error: "too many files open".
To fix this error, add to /etc/security/limits.conf the lines:

Code:
elasticsearch soft nofile 32000
elasticsearch hard nofile 32000
 
ElasticSearch failed on my new server. First I tried to get the rpm by wget, but it didnt work. I then downloaded it with the AuthParameter. I did all the steps in the manual but got an error.
 
ElasticSearch failed on my new server. First I tried to get the rpm by wget, but it didnt work. I then downloaded it with the AuthParameter. I did all the steps in the manual but got an error.

Try updating to the latest JRE and ES versions.

I performed an install only yesterday running the most up-to-date versions and all worked fine.
 
Top Bottom