1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

IGN's ElasticSearch _mapping

Discussion in 'Enhanced Search Support' started by Mike Tougeron, Jan 20, 2012.

  1. Mike Tougeron

    Mike Tougeron Well-Known Member

    At IGN we use the following _mapping for our ElasticSearch index. For about ~3 million messages this reduced the size of our index by > 30%.

    Code:
    curl -XPUT 'http://localhost:9200/xenforo_ign/post/_mapping' -d '
    {
        "post" : {
            "_source" : {
                "enabled" : false
            },
            "properties" : {
                "message" : {"type" : "string", "store" : "no"},
                "title" : {"type" : "string", "store" : "no", "index" : "no"},
                "date" : {"type" : "long", "store" : "yes"},
                "user" : {"type" : "long", "store" : "yes"},
                "discussion_id" : {"type" : "long", "store" : "yes"},
                "node" : {"type" : "long", "store" : "no"},
                "prefix" : {"type" : "long", "store" : "no"},
                "thread" : {"type" : "long", "store" : "no", "index" : "no"}
            }
        }
    }'
     
    curl -XPUT 'http://localhost:9200/xenforo_ign/thread/_mapping' -d '
    {
        "thread" : {
            "_source" : {
                "enabled" : false
            },
            "properties" : {
                "message" : {"type" : "string", "store" : "no", "index" : "no"},
                "title" : {"type" : "string", "store" : "no"},
                "date" : {"type" : "long", "store" : "yes"},
                "user" : {"type" : "long", "store" : "yes"},
                "discussion_id" : {"type" : "long", "store" : "yes"},
                "thread_id" : {"type" : "long", "store" : "yes"},
                "node" : {"type" : "long", "store" : "no"},
                "prefix" : {"type" : "long", "store" : "no"},
                "thread" : {"type" : "long", "store" : "no", "index" : "no"}
            }
        }
    }'
     
    $> curl 'http://localhost:9200/xenforo_ign/_settings'
    {
        "xenforo_ign": {
            "settings": {
                "index.analysis.analyzer.default.language": "English", 
                "index.analysis.analyzer.default.type": "snowball", 
                "index.number_of_replicas": "1", 
                "index.number_of_shards": "5"
            }
        }
    }
     
    $> curl 'http://localhost:9200/xenforo_ign/_settings'
    {
        "_shards": {
            "failed": 0, 
            "successful": 10, 
            "total": 10
        }, 
        "indices": {
            "xenforo_ign": {
                "docs": {
                    "deleted_docs": 3714, 
                    "max_doc": 3437784, 
                    "num_docs": 3434070
                }, 
                "index": {
                    "primary_size": "1.4gb", 
                    "primary_size_in_bytes": 1574012401, 
                    "size": "2.9gb", 
                    "size_in_bytes": 3148021951
                }, 
                "merges": {
                    "current": 0, 
                    "total": 3302, 
                    "total_time": "33.5m", 
                    "total_time_in_millis": 2010177
                }, 
                "refresh": {
                    "total": 31017, 
                    "total_time": "29.9m", 
                    "total_time_in_millis": 1794563
                }, 
    
     
    Robust, p4guru, CyclingTribe and 5 others like this.
  2. giorgino

    giorgino Well-Known Member

    Hi Mike, can you explain this a little more?
    thx
     
  3. ragtek

    ragtek Guest

  4. giorgino

    giorgino Well-Known Member

    Thank you ragtek, but this go over my comprehension (witch comprehension? :D)
     
  5. Mike Tougeron

    Mike Tougeron Well-Known Member

    Basically by default ElasticSearch stores the entire document you send to it with the index. This means if you have a lot of large messages the index & storage size can grow pretty big. Since XenES doesn't actually use the message from the original document sent to ES (it uses the IDs and then loads the results from MySQL) this mapping makes it so that the extra data isn't stored inside of ES. In the thread index, the message field is sent but it isn't searched. So I disabled storing and indexing of the field.

    p.s., you may have noticed I didn't include the profile posts index. That's because we are using MyIGN for that data not XenForo.
     
    p4guru, giorgino and Walter like this.
  6. Mike

    Mike XenForo Developer Staff Member

    Of course I can't find it now, but I'm pretty sure that ES doesn't store fields by default.
     
  7. Mike Tougeron

    Mike Tougeron Well-Known Member

    It doesn't store the fields by default, but it does store the _source.

    http://www.elasticsearch.org/guide/reference/mapping/source-field.html
    This means that if you don't store the _source you need to explicitly store fields that you want returned. In the case of XenForo this includes fields like date, user_id, etc.
     
    Mike likes this.
  8. Mike Tougeron

    Mike Tougeron Well-Known Member

    btw, fwiw, I thought so originally too. It wasn't until I blew up our dev ES install that I did the deep dive research. :p
     
  9. Floren

    Floren Well-Known Member

    Mike, what is the value for Elastic MAX memory allocation (ES_MAX_MEM) on your setup? How many posts IGN has now? I'm trying to determine the proper ratio between the number of posts and the total memory usage Elastic needs to serve search queries fast.

    Can you run Siege on your test server to emulate 30,000 online users while performing some random searches in parallel? Let me know the response time you get on search queries. I presume these are the average numbers IGN gets on a regular basis, not peak time.
     
  10. Mike Tougeron

    Mike Tougeron Well-Known Member

    The eng who maintains our ES farm isn't in yet but I sent him an email to find out more about our config for you.

    124,779 discussions & 3,370,918 posts. But when we are finished with the migration we'll have approx 70m posts.

    I don't have a "real" performance environment right now so I can't really scale our testing accurately. :( Once we're fully migrated to XenForo I'll post lots of info about what we're using, our performance stats, etc.
     
  11. RobParker

    RobParker Well-Known Member

    I have ES up and running on our test install and the search index rebuilt.

    What would I need to do to implement this? Is it literally just a case of pasting the above on the command line?
     
  12. Slavik

    Slavik XenForo Moderator Staff Member

    Yup :) Edited to your node obviously
     
  13. RobParker

    RobParker Well-Known Member

    Cheers

    I think I might need a little help if you don't mind.

    1) Our data is stored at /elasticsearch/data/cloud-spurs/nodes/0

    How would I then change http://localhost:9200/xenforo_ign/post/_mapping for our node? I don't quite see how that relates.

    2) Also this seems very specific for IGN's setup:

    Code:
    $> curl 'http://localhost:9200/xenforo_ign/_settings'
    {
        "_shards": {
            "failed": 0, 
            "successful": 10, 
            "total": 10
        }, 
        "indices": {
            "xenforo_ign": {
                "docs": {
                    "deleted_docs": 3714, 
                    "max_doc": 3437784, 
                    "num_docs": 3434070
                }, 
                "index": {
                    "primary_size": "1.4gb", 
                    "primary_size_in_bytes": 1574012401, 
                    "size": "2.9gb", 
                    "size_in_bytes": 3148021951
                }, 
                "merges": {
                    "current": 0, 
                    "total": 3302, 
                    "total_time": "33.5m", 
                    "total_time_in_millis": 2010177
                }, 
                "refresh": {
                    "total": 31017, 
                    "total_time": "29.9m", 
                    "total_time_in_millis": 1794563
                }, 
    
    
    Should I be using those numbers or somehow deriving my own? our elasticsearch.yml is the default (apart from setting our cluster name)
     
  14. Slavik

    Slavik XenForo Moderator Staff Member

    I'll put a guide up later tonight or tomorow that will explain it better.
     
    p4guru, Walter and RobParker like this.
  15. p4guru

    p4guru Well-Known Member

    thanks much appreciated :)
     

Share This Page