How to: Apply custom mapping.

Slavik

XenForo moderator
Staff member
This function is now built into XenForo Enhanced Search under AdminCP > Tools > Enhanced Search Setup.



Apply custom mapping to your Elasticsearch Index

This guide is provided for users who may wish to apply custom mapping to their Elasticsearch Index.

Please note this guide is considered experimental at the time of posting. Further mappings will follow.

This guide assumes the user has basic knowledge of SSH and prior to starting the steps below has logged in as root. Please note, whilst this is a simple and easy guide, I take no responsibility for any damages or losses that may occur to your system by following the steps below. If you are unsure at any stage, please ask for assistance or seek the help of a qualified Linux Systems Administrator.

Why do it?

Elasticsearch is a highly memory intensive proccess, unoptimised users with high load levels can expect to require approximately 1gb of memory per 1 million posts to maintain satisfactory search times. This guide will attempt to reduce these requirements by approximately one quarter to one half.

The mapping in this guide was provided by Mike Tougeron from IGN. All credit for the mappings remain with him.

Step 1

Switch to your Elasticsearch data directory.

Code:
cd /var/elasticsearch

Check the current size of your index

Code:
du -ach

In the test index I used for this example the index size was 2.9gb.

Step 2

Locate your index name.

Navigate to the subfolders of your data directory until you are within the /nodes/0/indices directory.

A directory here will have the name of your Elasticsearch index. This will usually be the name of your database unless specified otherwise in the Elasticsearch settings within your XenForo Admin CP.

Step 3

Apply the new mapping. If using putty as your SSH client you can copy all of the code below and right click within the terminal window to have it all entered for you, if using this method please change the "YOUR_INDEX_NAME" prior to pasting into Putty.

Code:
curl -XPUT 'http://localhost:9200/YOUR_INDEX_NAME/post/_mapping?ignore_conflicts=true' -d '
{
    "post" : {
        "_source" : {
            "enabled" : false
        },
        "properties" : {
            "message" : {"type" : "string", "store" : "no"},
            "title" : {"type" : "string", "store" : "no", "index" : "no"},
            "date" : {"type" : "long", "store" : "yes"},
            "user" : {"type" : "long", "store" : "yes"},
            "discussion_id" : {"type" : "long", "store" : "yes"},
            "node" : {"type" : "long", "store" : "no"},
            "prefix" : {"type" : "long", "store" : "no"},
            "thread" : {"type" : "long", "store" : "no", "index" : "no"}
        }
    }
}'

Code:
curl -XPUT 'http://localhost:9200/YOUR_INDEX_NAME/thread/_mapping?ignore_conflicts=true' -d '
{
    "thread" : {
        "_source" : {
            "enabled" : false
        },
        "properties" : {
            "message" : {"type" : "string", "store" : "no", "index" : "no"},
            "title" : {"type" : "string", "store" : "no"},
            "date" : {"type" : "long", "store" : "yes"},
            "user" : {"type" : "long", "store" : "yes"},
            "discussion_id" : {"type" : "long", "store" : "yes"},
            "thread_id" : {"type" : "long", "store" : "yes"},
            "node" : {"type" : "long", "store" : "no"},
            "prefix" : {"type" : "long", "store" : "no"},
            "thread" : {"type" : "long", "store" : "no", "index" : "no"}
        }
    }
}'

Step 4

Re-index your board from the XenForo Admin CP.

Step 5

Switch to your Elasticsearch data directory.

Code:
cd /var/elasticsearch

Check the new size of your index

Code:
du -ach

In the test index I used for this example the index size was now 1.5gb.

Hopefully, your index size should now be reduced.
 
Last edited:
That's interesting.

I wonder if there's a performance benefit still. Presumably not storing stuff will require less memory and there will be less to write/read?

I'm going to do some more testing in the week. I'm currently dying of man-flu so I might have missed something obvious as my concentration is not 100% at the moment.
 
(Re)building the cache kept crashing at my testserver. I upgraded the server to Ubuntu 12.04 and added this mapping.

And guess what, rebuilding finished!

22,5 mln posts, total index size is 9,9GB (weird: at 22,0 mln posts the dir size was 12GB)
 
I just tried this on a forum with around 7 million posts.

Before mapping the indexes were 4.7GB, after mapping and re-indexing via the control panel they actually increased to 5.9GB.
I then deleted the indexes in elastic search and rebuilt (mapping still in place) and the indexes were 4.7GB.
 
Interesting how some indexes shrink quite a bit, where others don't.

I am going to give this a shot tonight, after our peak hours. I finally found a service wrapper that can keep ES running, so it will keep us running until we can get someone to write us a Sphinx add-on.

Now remember, when a search against ES is done in XenForo all it does it searches the index for the requested item and return the ID of the relevent result in question, which is then passed onto mysql to retrieve and display the relevent data, ES itself does not return the actual post or thread, that is handled by mysql so there is no need for ES to keep that data lying around (and thus increase your index size).

I'm confused by this: if "relevant data" includes the body content of a post (the message itself), wouldn't that mean it would need to be stored in the index? Or is the index storing the occurrences of the words and phrases, and using that to return what XenForo needs?

I have noticed that phrase-based searching in our forum is very weak, especially if it can't find a title like "Who's Next" reliably, so ES hasn't been an improvement in that respect. We still send our members to Google to search for titles like this, since it works so much better. I wonder if there are any other mappings or configuration changes we can make that would improve this--I'd like to think any search add-on like Sphinx or ES could improve these search results.

My head is elsewhere lately...finishing up a very rough semester. Shifting from economics, accounting and advanced algebra to XenForo is too much of a stretch. :D
 
Interesting how some indexes shrink quite a bit, where others don't.

I am going to give this a shot tonight, after our peak hours. I finally found a service wrapper that can keep ES running, so it will keep us running until we can get someone to write us a Sphinx add-on.



I'm confused by this: if "relevant data" includes the body content of a post (the message itself), wouldn't that mean it would need to be stored in the index? Or is the index storing the occurrences of the words and phrases, and using that to return what XenForo needs?

I have noticed that phrase-based searching in our forum is very weak, especially if it can't find a title like "Who's Next" reliably, so ES hasn't been an improvement in that respect. We still send our members to Google to search for titles like this, since it works so much better. I wonder if there are any other mappings or configuration changes we can make that would improve this--I'd like to think any search add-on like Sphinx or ES could improve these search results.

My head is elsewhere lately...finishing up a very rough semester. Shifting from economics, accounting and advanced algebra to XenForo is too much of a stretch. :D

If you update to the latest version of ES, and use the "optimise" button, the mappings will be applied automatically for you.
 
If you update to the latest version of ES, and use the "optimise" button, the mappings will be applied automatically for you.

It's already built into 1.1.4? Sweet! Yes, I've already dropped and reindexed after upgrading to the latest/greatest ES version last night. That's good--one less step for me to worry about, and it also explains why I've noticed it has used less memory.
 
Top Bottom