Intermittent "Elasticsearch server returned no response" Errors

DeltaHF

Well-known member
I've been running ES on an SSD-based VPS with great success for months now. Within the past two days, though, my users have been complaining of intermittent slow speeds while submitting posts, and my XenForo Server Error log gets filled with reports like "XenForo_Exception: Elasticsearch server returned no response. Is it running? Elasticsearch indexing failed for post-9843906".

The errors have been intermittent over the past two days, and seem to last for periods of 5 to 15 minutes. I rebooted the server yesterday, but it hasn't made any difference.

My ES stats are below. There are no unusual reports in /elasticsearch/logs, and I have confirmed with the datacenter (this VPS is in a different datacenter than the webserver which hosts my forum) that there have been no connectivity issues (although I'm still slightly suspicious on this front).

Any ideas?
Code:
curl -XGET 'http://localhost:9200/_cluster/nodes/stats?pretty=true'
{
  "cluster_name" : "gtplanet_elasticsearch",
  "nodes" : {
    "YhjOQPm8SBKhpCeYffl1cQ" : {
      "timestamp" : 1404529162456,
      "name" : "Avalanche",
      "transport_address" : "inet[/107.170.29.55:9300]",
      "hostname" : "search.gtplanet.net",
      "indices" : {
        "docs" : {
          "count" : 10050867,
          "deleted" : 128439
        },
        "store" : {
          "size" : "3.4gb",
          "size_in_bytes" : 3697540447,
          "throttle_time" : "21.5s",
          "throttle_time_in_millis" : 21546
        },
        "indexing" : {
          "index_total" : 5950,
          "index_time" : "19.7s",
          "index_time_in_millis" : 19794,
          "index_current" : 0,
          "delete_total" : 86,
          "delete_time" : "144ms",
          "delete_time_in_millis" : 144,
          "delete_current" : 0
        },
        "get" : {
          "total" : 0,
          "get_time" : "0s",
          "time_in_millis" : 0,
          "exists_total" : 0,
          "exists_time" : "0s",
          "exists_time_in_millis" : 0,
          "missing_total" : 0,
          "missing_time" : "0s",
          "missing_time_in_millis" : 0,
          "current" : 0
        },
        "search" : {
          "open_contexts" : 0,
          "query_total" : 5655,
          "query_time" : "3.6m",
          "query_time_in_millis" : 216243,
          "query_current" : 0,
          "fetch_total" : 4680,
          "fetch_time" : "44.4s",
          "fetch_time_in_millis" : 44490,
          "fetch_current" : 0
        }
      }
    }
  }
}
 
Code:
$ curl -XGET 'http://localhost:9200'
{
  "ok" : true,
  "status" : 200,
  "name" : "Mister Doll",
  "version" : {
    "number" : "0.90.10",
    "build_hash" : "0a5781f44876e8d1c30b6360628d59cb2a7a2bbb",
    "build_timestamp" : "2014-01-10T10:18:37Z",
    "build_snapshot" : false,
    "lucene_version" : "4.6"
  },
  "tagline" : "You Know, for Search"
}
 
I did some more investigation into the syslog (/var/log/messages in CentOS) and noticed that iptables had been reporting segfaults around the same times the problems occurred. Researching that showed such a thing typically happens when the system is out of memory...

Upgraded the RAM on the VPS, and the problems have now been solved.
 
I'm starting to get these errors and my search is not working right now - anything you search for results in "no results found". I have never noticed this happening since we converted to XF back in May. Does that mean the server is running out of memory? I'm still running the XF 1.3.0 and Enhanced Search 1.0.3. I know it's time to upgrade but I still haven't gotten around to it. Just need to figure out what's causing it and how to fix it.
 
Top Bottom