• This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn more.

How to: Basic Elasticsearch installation. (RHEL/SUSE)

Slavik

XenForo moderator
Staff member
#1
Basic Elasticsearch Installation (RHEL / SUSE)

@Floren has an amazing repo for RHEL 6 and 7, the Elasticsearch RPM he provides is perfectly set up and currently I suggest using it.

The old manual setup guide can be found below.

Step 1) Install the Axivo Repo: https://www.axivo.com/resources/repository-setup.1/

Step 2) Install ElasticSearch: https://www.axivo.com/resources/elasticsearch-setup.11/




This guide is provided to show how to do a basic (vanilla get up and go) install of Elasticsearch (0.90.0 Beta 1), the Elasticsearch Service Wrapper and the required Java Runtime Environment (JRE) (1.7.0_17) on RHEL / SUSE. This guide will not cover running a dedicated Elasticsearch user.

For Debian/Ubuntu users, a guide can be found here.

This guide assumes the user has basic knowledge of SSH and prior to starting the steps below has logged in as root. This guide also assumes the user does not currently have any JRE installed. You can check if you have JRE installed by typing

Code:
java -version
As of writing, the current file locations for JRE are as follows:

32 bit
Code:
http://download.oracle.com/otn-pub/java/jdk/7u17-b02/jre-7u17-linux-i586.rpm
64 bit
Code:
http://download.oracle.com/otn-pub/java/jdk/7u17-b02/jre-7u17-linux-x64.rpm
The guide will be shown using the 64 bit install, however if you are using a 32 bit system, change the file names as appropriate.

Please note, whilst this is a simple and easy setup, I take no responsibility for any damages or losses that may occur to your system by following the steps below. If you are unsure at any stage, please ask for assistance or seek the help of a qualified Linux Systems Administrator.

Installing the JRE
Type the following commands into your SSH terminal.
Code:
cd /tmp
wget http://download.oracle.com/otn-pub/java/jdk/7u17-b02/jre-7u17-linux-x64.rpm
rpm -ivh jre-7u17-linux-x64.rpm
java -version
Assuming everything was done correctly, you should get the following output.

Code:
# java -version
java version "1.7.0_17"
Java(TM) SE Runtime Environment (build 1.7.0_17)
Java HotSpot(TM) 64-Bit Server VM (build 22.0-b10, mixed mode)
Install Elasticsearch

Code:
cd /
curl -L -O -k https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-0.90.0.Beta1.zip
unzip elasticsearch-0.90.0.Beta1.zip
mv elasticsearch-0.90.0.Beta1 elasticsearch
Install the Elasticsearch Service Wrapper

Code:
curl -L -k http://github.com/elasticsearch/elasticsearch-servicewrapper/tarball/master | tar -xz
mv *servicewrapper*/service elasticsearch/bin/
elasticsearch/bin/service/elasticsearch install
ln -s `readlink -f elasticsearch/bin/service/elasticsearch` /usr/local/bin/rcelasticsearch
rcelasticsearch start
Assuming everything was done correctly, you should see the following output.

Code:
rcelasticsearch start
Starting ElasticSearch...
Waiting for ElasticSearch......
running: PID: xxxxx
Basic Configuration

You should do some basic configuration of Elasticsearch before installing the addon in XenForo.


1) Open up /elasticsearch/config/elasticsearch.yml and on line 32 edit

Code:
# cluster.name: elasticsearch
To

Code:
cluster.name: PUT-SOMETHING-UNIQUE-HERE
on line 199 edit

Code:
# network.host: 192.168.0.1
to

Code:
network.host: 127.0.0.1
On line 211 edit

Code:
# http.port: 9200
to

Code:
http.port: 9200
Save and Close


2) Open up /elasticsearch/bin/service/elasticsearch.conf on line 2 edit

Code:
set.default.ES_HEAP_SIZE=1024
To a number suitable for the size of your forum.

I reccomend approximately 1 GB for the HEAP_SIZE per 1 million posts on your forum.

1 Million Posts: 1024
2 Million Posts: 2048
3 Million Posts: 3072
4 Million Posts: 4096
etc

This will not mean the service will use all that available memory, however it will have it at its disposal if required.

So for example a 3 Million Post forum would edit

Code:
set.default.ES_HEAP_SIZE=1024
to

Code:
set.default.ES_HEAP_SIZE=3072


Save and Exit.


3) Optional - Move the Elasticsearch data directory.

Your linux install may be configured in such a way that your install partition is only a few Gb in size, and placing a large Elasticsearch index there is not ideal.

In which case you will want to move the index directory to a different, larger, location (in this example /var/elasticsearch)

Code:
cd /var
mkdir elasticsearch
Open up /elasticsearch/config/elasticsearch.yml on line 143 edit

Code:
# path.data: /path/to/data
to

Code:
path.data: /var/elasticsearch
Save and Exit

4) Restart the Elasticsearch Service

In SSH type

Code:
rcelasticsearch restart
You should get the following output

Code:
rcelasticsearch restart
Stopping ElasticSearch...
Stopped ElasticSearch.
Starting ElasticSearch...
Waiting for ElasticSearch......
running: PID: xxxxx
Elasticsearch is now runing with your updated config.



Install the XenForo Enhanced Search Addon

1) Turn your board off into maintainance mode*

2) Download the addon from your customer area at http://xenforo.com/customers/

3) Follow the instructions found at http://xenforo.com/help/enhanced-search/

4) Wait for your indexes to be rebuilt

5) Open your board.

6) Install the index pre-warmer.

As of 0.90.0 Beta an index pre-warmer is available. This keeps your search index "warm" in active memory so when a search is done, the access time latency is highly reduced.

Installing this is simple, in SSH simply run the following replacing the *INDEX NAME* with the name of your ES index.

Code:
curl -XPUT localhost:9200/*INDEX NAME*/_warmer/warmer_1 -d '{
    "query" : {
        "match_all" : {}
    }
}'
You should have the following returned

Code:
{"ok":true,"acknowledged":true}

*You may leave your board open during the re-index process.

Congratulations. Your board should now be running XenForo Enhanced Search.
 
Last edited:

Slavik

XenForo moderator
Staff member
#5
lol ... I realised you'd written that one after I'd fought my way through this one!! DOH!

Whilst it is currently working on my server, would I be better off using the Debian wrapper as opposed to this one?
In theory they should be doing the exact same thing, just the RHEL one comes a little more pre-configured than the Debian one, and the RHEL one has the settings for the service wrapper in an external file opposed to in the wrapper itself.
 

tmb

Active member
#7
Definitely saved me some time. Thanks. Now I just need to get around to updating to 1.1 so I can finish installing the search add-on.
 

graham_w

Active member
#9
Thanks for this great, easy to use guide! Was very quick to install on a Centos 5.7 x64 box using this guide and rebuilding cache as we speak :)
 

shawn

Well-known member
#10
Yeah, might want to edit the title and add CentOS to the list... just in case folks don't know it's RHEL-based. Should help with search queries, too.
 

SneakyDave

Well-known member
#12
Yeah, might want to edit the title and add CentOS to the list... just in case folks don't know it's RHEL-based. Should help with search queries, too.
If you're running CentOS and don't know it's RHEL based, you probably shouldn't be running it! Just a joke!
 

Floren

Well-known member
#13
I reccomend approximately 256mb for the MIN_MEM and 1 GB for the MAX_MEM per 1 million posts on your forum.

1 Million Posts: 256:1024
2 Million Posts: 512:2048
3 Million Posts: 768:3072
4 Million Posts: 1024:4096
etc

This will not mean the service will use all that available memory, however it will have it at its disposal if required.
Just curious, how did you get those numbers? I just try to make sure the memory requirements are accurate because a board with over 30mil posts will require 32GB of RAM just to run the search. Compared to Sphinx, who needs only 512MB for the same number of posts.... this is a BIG difference. Anyone knows what is the I/O impact on disks to read/write the index data?

Looks like is confirmed, you will need a lot of memory to run Elastic on a large forum.
A forum with 9mil posts and 1GB of allocated memory returns search results in 6-7seconds, slower than MySQL. That confirms the calculations posted by Slavik above.
 

digitalpoint

Well-known member
#14
a board with over 30mil posts will require 32GB of RAM just to run the search. Compared to Sphinx, who needs only 512MB for the same number of posts.... this is a BIG difference. Anyone knows what is the I/O impact on disks to read/write the index data?

Looks like is confirmed, you will need a lot of memory to run Elastic on a large forum.
A forum with 9mil posts and 1GB of allocated memory returns search results in 6-7seconds, slower than MySQL. That confirms the calculations posted by Slavik above.
I was actually wondering the same thing myself. I purchased the XF enhanced search, but I haven't had time to mess with it yet. But seeing the speed and memory requirements have me a little worried. My current Sphinx setup for vB takes about 2GB of memory for around 25M searchable documents spread across 16 searchable content types (posts, users, PMs, FAQs, articles, blogs, etc.) and results NEVER takes more than 0.1 seconds for the most obscure search (usually more like 0.02 seconds).

But we'll see how it works for me before too long...
 

Floren

Well-known member
#15
My current Sphinx setup for vB takes about 2GB of memory for around 25M searchable documents spread across 16 searchable content types (posts, users, PMs, FAQs, articles, blogs, etc.) and results NEVER takes more than 0.1 seconds for the most obscure search (usually more like 0.02 seconds).
My thoughts exactly, related to query speed. Honestly, 2GB is a lot for 25mil posts. Maybe because you spread it through a lot of indices. I recently setup Searchlight on XDA-Developers site with workers as threads and they use less than 1GB of RAM with 30,000 online users and 20mil posts.

I want to see what IGN says about the memory consumption on their test board and find out about the I/O impact on disks also, no idea how often the data is read/written into indices.
 

digitalpoint

Well-known member
#16
Well, it's not 25M posts... It's ~18M posts, and the non-post stuff tends to be larger on average. For example PMs tend to be larger than posts. Users are also searchable and they end up being daily large bits of data since we make everything about the user searchable (for example every email they ever used). We also make IP searchable via Sphinx on all content type, so that's an extra 25M 32-bit numbers, etc...

Hopefully the new search doesn't take the amount of memory people are saying or requires continuous "warm-up" to be fast. I have quite a bit of work to do still before I can really dig into it.
 

lazy llama

Well-known member
#17
Looks like is confirmed, you will need a lot of memory to run Elastic on a large forum.
A forum with 9mil posts and 1GB of allocated memory returns search results in 6-7seconds, slower than MySQL. That confirms the calculations posted by Slavik above.
That is on my test system which is very low spec though, and I've never run MySQL searches on it but suspect they'd be slower. I don't have the disk space on that to do like-for-like comparison with Sphinx either sadly. Sphinx did appear to be slightly quicker and use less resource though. It certainly used less disk space.
I'm hoping to investigate further over the next few days, and I'm certainly not ruling anything out at this early stage.
 

Slavik

XenForo moderator
Staff member
#19
Just curious, how did you get those numbers? I just try to make sure the memory requirements are accurate because a board with over 30mil posts will require 32GB of RAM just to run the search. Compared to Sphinx, who needs only 512MB for the same number of posts.... this is a BIG difference. Anyone knows what is the I/O impact on disks to read/write the index data?

Looks like is confirmed, you will need a lot of memory to run Elastic on a large forum.
A forum with 9mil posts and 1GB of allocated memory returns search results in 6-7seconds, slower than MySQL. That confirms the calculations posted by Slavik above.
I was actually wondering the same thing myself. I purchased the XF enhanced search, but I haven't had time to mess with it yet. But seeing the speed and memory requirements have me a little worried. My current Sphinx setup for vB takes about 2GB of memory for around 25M searchable documents spread across 16 searchable content types (posts, users, PMs, FAQs, articles, blogs, etc.) and results NEVER takes more than 0.1 seconds for the most obscure search (usually more like 0.02 seconds).

But we'll see how it works for me before too long...

Please remember, my reccommended settings are for this basic setup guide only.

If you are clustering a large forum, I would expect you to be using custom mapping which may reduce the requirements by up to half. However information like that is for an advanced guide (underway) for much larger boards, and also includes things such as setting up dedicated elasticsearch users etc.

What I would like to know from you guys with the 20m+ forums, what OS are you running on?
 

Floren

Well-known member
#20
Maybe it's just because it's using a minimum word length of 1 or something... {shrug}
You mean the Star enabled in Sphinx? I use the Star only for thread titles, or else the indices will become huge. Searching for 1char does not increases the index if you don't enable the Star. I presume the same logic is used for Elastic?

Even if is half of the memory usage with a custom mapping, Elastic is using LARGE amounts of memory compared to Sphinx. Doing a quick search on Nabble for other Elasic user experiences revealed that all big users had a common issue: Elastic needs several machines with lots of memory in order to be query efficient. Obviously we are talking about millions of documents, not the average Joe user. Most relevant case for us is this thread where the developer confirms the high memory usage for 5mil posts only. He also recommends the use of several boxes with small amounts of memory instead of a large one, for increased shard performance? Not sure. Plus, reading the guide tells me Elastic requires constant warm-up if your indices are not completely stored into memory:
The index can either be stored in-memory (no persistence) or on-disk (the default). In-memory indices provide better performance at the cost of limiting the index size to the amount of available physical memory.
Let me know if I misunderstood.