How to: Apply custom mapping.

Slavik

XenForo moderator
Staff member
This function is now built into XenForo Enhanced Search under AdminCP > Tools > Enhanced Search Setup.



Apply custom mapping to your Elasticsearch Index

This guide is provided for users who may wish to apply custom mapping to their Elasticsearch Index.

Please note this guide is considered experimental at the time of posting. Further mappings will follow.

This guide assumes the user has basic knowledge of SSH and prior to starting the steps below has logged in as root. Please note, whilst this is a simple and easy guide, I take no responsibility for any damages or losses that may occur to your system by following the steps below. If you are unsure at any stage, please ask for assistance or seek the help of a qualified Linux Systems Administrator.

Why do it?

Elasticsearch is a highly memory intensive proccess, unoptimised users with high load levels can expect to require approximately 1gb of memory per 1 million posts to maintain satisfactory search times. This guide will attempt to reduce these requirements by approximately one quarter to one half.

The mapping in this guide was provided by Mike Tougeron from IGN. All credit for the mappings remain with him.

Step 1

Switch to your Elasticsearch data directory.

Code:
cd /var/elasticsearch

Check the current size of your index

Code:
du -ach

In the test index I used for this example the index size was 2.9gb.

Step 2

Locate your index name.

Navigate to the subfolders of your data directory until you are within the /nodes/0/indices directory.

A directory here will have the name of your Elasticsearch index. This will usually be the name of your database unless specified otherwise in the Elasticsearch settings within your XenForo Admin CP.

Step 3

Apply the new mapping. If using putty as your SSH client you can copy all of the code below and right click within the terminal window to have it all entered for you, if using this method please change the "YOUR_INDEX_NAME" prior to pasting into Putty.

Code:
curl -XPUT 'http://localhost:9200/YOUR_INDEX_NAME/post/_mapping?ignore_conflicts=true' -d '
{
    "post" : {
        "_source" : {
            "enabled" : false
        },
        "properties" : {
            "message" : {"type" : "string", "store" : "no"},
            "title" : {"type" : "string", "store" : "no", "index" : "no"},
            "date" : {"type" : "long", "store" : "yes"},
            "user" : {"type" : "long", "store" : "yes"},
            "discussion_id" : {"type" : "long", "store" : "yes"},
            "node" : {"type" : "long", "store" : "no"},
            "prefix" : {"type" : "long", "store" : "no"},
            "thread" : {"type" : "long", "store" : "no", "index" : "no"}
        }
    }
}'

Code:
curl -XPUT 'http://localhost:9200/YOUR_INDEX_NAME/thread/_mapping?ignore_conflicts=true' -d '
{
    "thread" : {
        "_source" : {
            "enabled" : false
        },
        "properties" : {
            "message" : {"type" : "string", "store" : "no", "index" : "no"},
            "title" : {"type" : "string", "store" : "no"},
            "date" : {"type" : "long", "store" : "yes"},
            "user" : {"type" : "long", "store" : "yes"},
            "discussion_id" : {"type" : "long", "store" : "yes"},
            "thread_id" : {"type" : "long", "store" : "yes"},
            "node" : {"type" : "long", "store" : "no"},
            "prefix" : {"type" : "long", "store" : "no"},
            "thread" : {"type" : "long", "store" : "no", "index" : "no"}
        }
    }
}'

Step 4

Re-index your board from the XenForo Admin CP.

Step 5

Switch to your Elasticsearch data directory.

Code:
cd /var/elasticsearch

Check the new size of your index

Code:
du -ach

In the test index I used for this example the index size was now 1.5gb.

Hopefully, your index size should now be reduced.
 
Last edited:
That's great, thanks. Will give it a go on our test board later.

Only thing missing is that IGN also had some mappings which had timeouts, sizes, etc in them. Are they relevant/useful here?
 
That's great, thanks. Will give it a go on our test board later.

Only thing missing is that IGN also had some mappings which had timeouts, sizes, etc in them. Are they relevant/useful here?

IGN's settings are customised for their own setup and are not relevent to the mappings.

Also, do we need to delete the current indices before applying the mapping?

No, upon initiating a re-index XenForo removes the indices for you.
 
Thanks for the How-To. Looks useful, especially for the larger boards.

Any chance of explaining what the custom mapping actually does and how it differs from the default?
I'm guessing it sets ES so that it doesn't index title and thread for posts, and message and thread number for threads (i.e. removes indexing of things which aren't searchable).
 
Thanks for the How-To. Looks useful, especially for the larger boards.

Any chance of explaining what the custom mapping actually does and how it differs from the default?
I'm guessing it sets ES so that it doesn't index title and thread for posts, and message and thread number for threads (i.e. removes indexing of things which aren't searchable).

It's to do with how ES stores the data on indexing. By default the entirety of a post or thread gets sent to ES and stored within the ES storage directories, however it is not actually used by XenForo to locate relevent posts or threads in searches, in fact it locates the relevent content from the index and then loads the relevent information from mysql, so effectively it is storing information which it does not need to.

This mapping stops that extra unused information from being stored.
 
Thanks, I like to know what changes do, rather than blindly following and that seems to tie up with the "store" "no" pairs in the code posted.

http://www.elasticsearch.org/guide/reference/mapping/source-field.html

The _source field is an automatically generated field that stores the actual JSON that was used as the indexed document. It is not indexed (searchable), just stored. When executed “fetch” requests, like get or search, the _source field is returned by default.
Though very handy to have around, the source field does incur storage overhead within the index. For this reason, it can be disabled.

However, if you remove the _source field you need to tell ES what you would like it to store to return when a search is requested. In the case of threads as an example

http://www.elasticsearch.org/guide/reference/mapping/core-types.html

"message" : {"type" : "string", "store" : "no", "index" : "no"},
The type is a string, not an integer, the message itself is not stored in ES because if it is ever requested it is returned from mysql and not ES, and the index is no to stop it being used in the analyzer when being searched.

http://www.elasticsearch.org/guide/reference/mapping/analyzer-field.html

Now remember, when a search against ES is done in XenForo all it does it searches the index for the requested item and return the ID of the relevent result in question, which is then passed onto mysql to retrieve and display the relevent data, ES itself does not return the actual post or thread, that is handled by mysql so there is no need for ES to keep that data lying around (and thus increase your index size).
 
Excellent post - thanks for the share (Mike) and the instructions (Slavik) - I'll give this a go with CycleChat and see how it works out.

Does this improve the speed of the results at all or just lower the data/memory footprint on your server?
 
Excellent post - thanks for the share (Mike) and the instructions (Slavik) - I'll give this a go with CycleChat and see how it works out.

Does this improve the speed of the results at all or just lower the data/memory footprint on your server?

Well hopefully the latter improves the former ;)

I hope KAM don't mind me temporarily opening up my test forum to the public just to show you the effects (or I can just use one of my spare XenForo licenses if needs be).

Test forum
http://tierten.co.uk/

Live forum
http://p8ntballer-forums.com/


Perform a search on both, the test forum uses the index with mapping, the live is without, as you can see, both return relevent and identical (if you ignore the age differential of when the dump of the test forum was made) results
 
Hmm is this correct or not? It seems to be missing the store:no parts?

Code:
{"post":{"properties":{"message":{"type":"string"},"title":{"type":"string"},"node":
{"type":"long"},"thread":{"type":"long"},"prefix":{"type":"long"},"discussion_id":
{"type":"long"},"date":{"type":"long"},"user":{"type":"long"}}},"profile_post":{"properties":
{"message":{"type":"string"},"title":{"type":"string"},"profile_user":
{"type":"long"},"discussion_id":{"type":"long"},"date":{"type":"long"},"user":
{"type":"long"}}},"thread":{"properties":{"message":{"type":"string"},"title":
{"type":"string"},"node":{"type":"long"},"thread":{"type":"long"},"prefix":
{"type":"long"},"discussion_id":{"type":"long"},"date":{"type":"long"},"user":{"type":"long"}}}}}
 
thanks Slavik for the guide, maybe i missed it but for the test forum before 2.9GB vs after 1.5GB sizes, what was the test forums post count size ?
 
Top Bottom