• This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn more.

How to: Basic Elasticsearch installation. (RHEL/SUSE)

digitalpoint

Well-known member
#21
What I would like to know from you guys with the 20m+ forums, what OS are you running on?
I was using SUSE Linux Enterprise until last September. At that point I switched everything over to openSUSE (same underpinnings, but free vs. ~$940/per 3 years, per server, and also not a generation behind).
 

Marcus

Well-known member
#22
It works now on my CentOS installation:

I set configuration in this section as follow and the search index can be rebuild now:

############################## Network And HTTP ###############################

# ElasticSearch, by default, binds itself to the 0.0.0.0 address, and listens
# on port [9200-9300] for HTTP traffic and on port [9300-9400] for node-to-node
# communication. (the range means that if the port is busy, it will automatically
# try the next port).

# Set the bind address specifically (IPv4 or IPv6):
#
# network.bind_host: 192.168.0.1

# Set the address other nodes will use to communicate with this node. If not
# set, it is automatically derived. It must point to an actual IP address.
#
# network.publish_host: 192.168.0.1

# Set both 'bind_host' and 'publish_host':
#
network.host: 127.0.0.1

# Set a custom port for the node to node communication (9300 by default):
#
# transport.tcp.port: 9300

# Enable compression for all communication between nodes (disabled by default):
#
# transport.tcp.compress: true

# Set a custom port to listen for HTTP traffic:
#
http.port: 9200

# Set a custom allowed content length:
#
# http.max_content_length: 100mb

# Disable HTTP completely:
#
# http.enabled: false

In addition I asked my server management to open port 9200 (before I started this thread), I am unsure if this was necessary.
 

Rob

Well-known member
#24
I can start the service manually no problem, but it will not start automatically on server reboot.

Any ideas?

Thanks
 

Rob

Well-known member
#26
Well, i followed the RHEL/SUSE install instructions in this thread. Do i really need to do that? If so, how do i make a script?
 

Slavik

XenForo moderator
Staff member
#27
Well, i followed the RHEL/SUSE install instructions in this thread. Do i really need to do that? If so, how do i make a script?
If you ran the "elasticsearch/bin/service/elasticsearch install" command it should start automatically on boot, which, if it isn't would suggest a server configuration or installation error.
 

p4guru

Well-known member
#29
Curious are Elastic Search's memory requirements still as large as what Floren/Shawn etc have discussed in the older posts on this thread ? max 1GB per 1 million posts ?
 

Slavik

XenForo moderator
Staff member
#30
Curious are Elastic Search's memory requirements still as large as what Floren/Shawn etc have discussed in the older posts on this thread ? max 1GB per 1 million posts ?
Custom mapping seems to be able to reduce it from between 1/4 to 1/2 depending on your board.
 

Floren

Well-known member
#32
Still pretty hefty for a 30 m post forum, but reduced from 30GB to 8-15GB still nice
Isn't that custom mapping designed to produce slow output results? I'm trying to understand what the "store" part does:
"message" : {"type" : "string", "store" : "no"}
The idea of storing the strings into memory is to allow a quick search through them based on the original search query and avoid a warm-up. You search for specific keywords into "message" string (for example), they are processed through advanced search and a number of ID's is returned.

I was wondering if anyone can post some comparative results between the full memory index usage and a custom mapping one. Testing the IGN search shows their results pulled in average between 2 and 3 seconds.
 

Slavik

XenForo moderator
Staff member
#33
Isn't that custom mapping designed to produce slow output results? I'm trying to understand what the "store" part does:
"message" : {"type" : "string", "store" : "no"}
The idea of storing the strings into memory is to allow a quick search through them based on the original search query and avoid a warm-up. You search for specific keywords into "message" string (for example), they are processed through advanced search and a number of ID's is returned.

I was wondering if anyone can post some comparative results between the full memory index usage and a custom mapping one. Testing the IGN search shows their results pulled in average between 2 and 3 seconds.
http://xenforo.com/community/threads/how-to-apply-custom-mapping.31103/#post-355331
 

Floren

Well-known member
#34
I'm sorry but that post does not answers my question. It does not say anything about the memory usage or how data is stored. From my experience, you can either store the data on disk or memory. The "store" condition means you store it on disk or memory? The data has to be stored somewhere, orelse how can you search it?

The moment you store the data on disk, Elastic will need a warm-up and produce slow results. If has been confirmed by many, including the Elastic developers.
 

EQnoble

Well-known member
#35
http://www.elasticsearch.org/guide/reference/index-modules/store.html
Simple FS

The simplefs type is a straightforward implementation of file system storage (maps to Lucene SimpleFsDirectory) using a random access file. This implementation has poor concurrent performance (multiple threads will bottleneck). Its usually better to use the niofs when you need index persistence.
NIO FS

The niofs type stores the shard index on the file system (maps to Lucene NIOFSDirectory) using NIO. It allows multiple threads to read from the same file concurrently. It is not recommended on Windows because of a bug in the SUN Java implementation.
MMap FS

The mmapfs type stores the shard index on the file system (maps to Lucene MMapDirectory) by mapping a file into memory (mmap). Memory mapping uses up a portion of the virtual memory address space in your process equal to the size of the file being mapped. Before using this class, be sure your have plenty of virtual address space.
dunno if that helps
 

Slavik

XenForo moderator
Staff member
#36
I'm sorry but that post does not answers my question. It does not say anything about the memory usage or how data is stored. From my experience, you can either store the data on disk or memory. The "store" condition means you store it on disk or memory? The data has to be stored somewhere, orelse how can you search it?

The moment you store the data on disk, Elastic will need a warm-up and produce slow results. If has been confirmed by many, including the Elastic developers.
When a document is indexed under default settings an extra field is created in the index which stores that original document alongside the indexed content in case it is required (in most cases, this would be used to return and show the actual search result).

XenForo does not opperate the search like this, instead the search queries the content as per normal and then ES returns the content id which is then shown from the database and not the document from ES. As this document is never used it is effectively dead data, and thus by removing it you aim to reduce the index size on disk (and in turn, when the files are opened and stored in memory, reduce the memory requirement)

Nothing actually happens in effect to "how" ES is working for XenForo, the ES instance still works as before, we are just removing unused information.

I threw this together, maybe it will help you visualise it in a simple way.

flow.jpg

As you can see, the XenForo search doesn't require that original document to be stored. All the mapping does is chop off that extra.
 

Floren

Well-known member
#37
When a document is indexed under default settings an extra field is created in the index which stores that original document alongside the indexed content in case it is required (in most cases, this would be used to return and show the actual search result).
Thanks, now I understand. So by default Elastic adds twice the same document data, one is for searching and one for returning the actual document in case is needed.
As you can see, the XenForo search doesn't require that original document to be stored. All the mapping does is chop off that extra.
Now, how do we measure the performance and hardware needs? By reading other users experiences, the number of documents implies twice the size of the memory. For example 500k posts will require 1GB of ram, 700k posts 1.4GB, etc. But that is not accurate at all, documents are not equal in size. The custom mapping will eliminate a portion of the memory usage so we end-up with what in real life? You cannot guess on that area, so it has to be a real factor calculator that allows you to determine precisely the memory usage.

For example, on Sphinx is very easy. All you have to do is sum the total size of all indices that are stored into memory and you know for sure the actual memory usage:

memory.png

In my example for 46 million posts, 3.3GB of RAM are used to produce index results extracted in 0.057 seconds.
It has to be a method that allows us to be very precise in determining the memory used. A large forum cannot "guess", it has to take into consideration all factors before they might end-up with outrageous new hardware requirements.

I talk about memory storage a lot because it seems is the ONLY direction that is viable for Elastic, in order to produce proper search results in a fashionable time. Storing the data into disk will simply produce undesirable slow results. And since the memory data is volatile, what happens if I reboot the server? How long does it takes for memory indices to rebuilt? Is the index data created locally as file and then its contents stored into memory?
 

Slavik

XenForo moderator
Staff member
#40
A large forum cannot "guess", it has to take into consideration all factors before they might end-up with outrageous new hardware requirements.

I talk about memory storage a lot because it seems is the ONLY direction that is viable for Elastic, in order to produce proper search results in a fashionable time. Storing the data into disk will simply produce undesirable slow results. And since the memory data is volatile, what happens if I reboot the server? How long does it takes for memory indices to rebuilt? Is the index data created locally as file and then its contents stored into memory?
This is the part a few of us are trying to work out, factors effecting index sizes and speed and attempting to work out a requirement, having said that the 1gb/million posts seems like a reasonable enough starting point and any custom mapping reducing that requirement is a bonus... but for the time being it seems more like a "suck it and see" aproach is the only way to get those answers.

Es indexes are written to disk as files within the /es/data directory, when a file is requested it is loaded into memory (at which point it remains open for a period of time before being closed) unless you specifically set the storage type to memory in which case the indexes are exclusively indexed and stored to your servers memory with the associated risks of doing so and hence why some of us are using a little bash script to randomly load search terms to maintain the files in active memory to speed up searches.


So can you post the memory usage on indices?
DP is releasing an addon, theres a nice pretty UI you can find on github (https://github.com/mobz/elasticsearch-head) or more simply hit this in and work out the numbers yourself, also remember memory usage is linked to user load. :)

Code:
 curl -XGET 'http://localhost:9200/_cluster/nodes/stats?pretty=true'


Code:
curl -XGET 'http://localhost:9200/_cluster/nodes/stats?pretty=true'
{
  "cluster_name" : "testbedES",
  "nodes" : {
    "myP4CtBhRLOH3BB1NtDYUw" : {
      "name" : "Stardust",
      "indices" : {
        "store" : {
          "size" : "948mb",
          "size_in_bytes" : 994055264
        },
        "docs" : {
          "count" : 1317585,
          "deleted" : 2472
        },
        "indexing" : {
          "index_total" : 8749,
          "index_time" : "1.4m",
          "index_time_in_millis" : 85390,
          "index_current" : 0,
          "delete_total" : 84,
          "delete_time" : "855ms",
          "delete_time_in_millis" : 855,
          "delete_current" : 0
        },
        "get" : {
          "total" : 0,
          "time" : "0s",
          "time_in_millis" : 0,
          "exists_total" : 0,
          "exists_time" : "0s",
          "exists_time_in_millis" : 0,
          "missing_total" : 0,
          "missing_time" : "0s",
          "missing_time_in_millis" : 0,
          "current" : 0
        },
        "search" : {
          "query_total" : 139160,
          "query_time" : "1.5h",
          "query_time_in_millis" : 5475259,
          "query_current" : 0,
          "fetch_total" : 103345,
          "fetch_time" : "4.8h",
          "fetch_time_in_millis" : 17426649,
          "fetch_current" : 0
        },
        "cache" : {
          "field_evictions" : 0,
          "field_size" : "15mb",
          "field_size_in_bytes" : 15823076,
          "filter_count" : 6,
          "filter_evictions" : 0,
          "filter_size" : "826kb",
          "filter_size_in_bytes" : 845824
        },
        "merges" : {
          "current" : 0,
          "current_docs" : 0,
          "current_size" : "0b",
          "current_size_in_bytes" : 0,
          "total" : 1,
          "total_time" : "86ms",
          "total_time_in_millis" : 86,
          "total_docs" : 19,
          "total_size" : "38.9kb",
          "total_size_in_bytes" : 39885
        },
        "refresh" : {
          "total" : 8632,
          "total_time" : "1.2m",
          "total_time_in_millis" : 74320
        },
        "flush" : {
          "total" : 5155,
          "total_time" : "9.3m",
          "total_time_in_millis" : 562688
        }
      },
      "os" : {
        "timestamp" : 1336296997532,
        "uptime" : "515 hours, 52 minutes and 41 seconds",
        "uptime_in_millis" : 1857161000,
        "load_average" : [ 0.46, 0.35, 0.33 ],
        "cpu" : {
          "sys" : 2,
          "user" : 3,
          "idle" : 93
        },
        "mem" : {
          "free" : "5.3gb",
          "free_in_bytes" : 5727977472,
          "used" : "2.4gb",
          "used_in_bytes" : 2622029824,
          "free_percent" : 77,
          "used_percent" : 22,
          "actual_free" : "6gb",
          "actual_free_in_bytes" : 6451220480,
          "actual_used" : "1.7gb",
          "actual_used_in_bytes" : 1898786816
        },
        "swap" : {
          "used" : "11.9mb",
          "used_in_bytes" : 12550144,
          "free" : "3.7gb",
          "free_in_bytes" : 4001366016
        }
      },
      "process" : {
        "timestamp" : 1336296997533,
        "open_file_descriptors" : 585,
        "cpu" : {
          "percent" : 0,
          "sys" : "5 minutes, 38 seconds and 720 milliseconds",
          "sys_in_millis" : 338720,
          "user" : "14 minutes, 35 seconds and 220 milliseconds",
          "user_in_millis" : 875220,
          "total" : "20 minutes, 13 seconds and 940 milliseconds",
          "total_in_millis" : 1213940
        },
        "mem" : {
          "resident" : "596.9mb",
          "resident_in_bytes" : 625917952,
          "share" : "11mb",
          "share_in_bytes" : 11628544,
          "total_virtual" : "4.5gb",
          "total_virtual_in_bytes" : 4902260736
        }
      },
      "jvm" : {
        "timestamp" : 1336296997533,
        "uptime" : "515 hours, 46 minutes, 22 seconds and 165 milliseconds",
        "uptime_in_millis" : 1856782165,
        "mem" : {
          "heap_used" : "238.2mb",
          "heap_used_in_bytes" : 249812744,
          "heap_committed" : "509.9mb",
          "heap_committed_in_bytes" : 534708224,
          "non_heap_used" : "36.9mb",
          "non_heap_used_in_bytes" : 38785000,
          "non_heap_committed" : "57.7mb",
          "non_heap_committed_in_bytes" : 60506112
        },
        "threads" : {
          "count" : 41,
          "peak_count" : 63
        },
        "gc" : {
          "collection_count" : 9209,
          "collection_time" : "34 seconds and 935 milliseconds",
          "collection_time_in_millis" : 34935,
          "collectors" : {
            "ParNew" : {
              "collection_count" : 9191,
              "collection_time" : "34 seconds and 750 milliseconds",
              "collection_time_in_millis" : 34750
            },
            "ConcurrentMarkSweep" : {
              "collection_count" : 18,
              "collection_time" : "185 milliseconds",
              "collection_time_in_millis" : 185
            }
          }
        }
      },
      "network" : {
        "tcp" : {
          "active_opens" : 76254,
          "passive_opens" : 8693590,
          "curr_estab" : 19,
          "in_segs" : 260179338,
          "out_segs" : 121374258,
          "retrans_segs" : 1126081,
          "estab_resets" : 31153,
          "attempt_fails" : 11703,
          "in_errs" : 0,
          "out_rsts" : 38940
        }
      },
      "transport" : {
        "server_open" : 7,
        "rx_count" : 0,
        "rx_size" : "0b",
        "rx_size_in_bytes" : 0,
        "tx_count" : 0,
        "tx_size" : "0b",
        "tx_size_in_bytes" : 0
      },
      "http" : {
        "current_open" : 1,
        "total_opened" : 35904
      }
    }
  }
}