Server issue When XF can't connect to ES, content creation literally stops

Marcus

Well-known member
When my forum can not connect to the ES server, content literally can not be created anymore. Whenever a user posts content (creates a post, replies to a post) it may take minutes to post.

The forum works just as normal with two exceptions: Content will not be created anymore, and there is an error entry within acp error log.

A solution might be to separate both content creation and search index creation.
 
Last edited:
Which version of XFES are you using?

Some months ago (less than 6 I think) I had a problem on my server which caused the ES process to be killed at the OS level. I noticed it 3 days later and during that period everything else (excluding search) was working fine.
 
Content will not be created anymore, and there is an error entry within acp error log.
What happens when you try to create content? Is there an error? Does that coincide with an error being logged in the Server Error Log? What is that error?
 
I use the latest xf with the latest xf es.
Code:
# curl 1.2.3.4:9200
{
  "status" : 200,
  "name" : "Sagittarius",
  "cluster_name" : "myforum",
  "version" : {
    "number" : "1.4.4",
    "build_hash" : "c88f77ffc81301dfa9dfd81ca2232f09588bd512",
    "build_timestamp" : "2015-02-19T13:05:36Z",
    "build_snapshot" : false,
    "lucene_version" : "4.10.3"
  },
  "tagline" : "You Know, for Search"
}
After some digging in, it looks like the content will be created, but the delay is like 10 minutes or longer. Meaning both the browser will be in sandglass mode, and when the user reloads the page the content is not there.

The tricky part is that there will be no error messages within the acp error log about delaying content creation. From what I guess, once the user creates content, xenforos datawriter tries to connect to the elasticsearch server as long as it times out. And my setup is that is seems it doesn't timeout as well.

When the user gives up after a minute or more and wants to refresh the page, the post still is not visible. And the thread is not created within the forum. Maybe that is because the datawriter did not finish its work.
 
Maybe there is no time limit from xenforos side when connecting to the non-existant elasticsearch server. One interesting bit of information :) It is not possible to even ping the elasticsearch server, this elasticsearch server was totally hiding behind its firewall setup. There were two different errors (I have I think like 30 pages of error message because of this issue)
Code:
XenForo_Exception: Elasticsearch server returned no response. Is it running? Elasticsearch indexing failed - library/XenES/Search/SourceHandler/ElasticSearch.php:850

#0 library/XenES/Search/SourceHandler/ElasticSearch.php(965): XenES_Search_SourceHandler_ElasticSearch->_logSearchResponseError(false, false, 'Elasticsearch i...')
#1 library/XenES/Search/SourceHandler/ElasticSearch.php(63): XenES_Search_SourceHandler_ElasticSearch->_triggerFailedIndexError(false, Array, true)
#2 library/XenForo/Search/Indexer.php(44): XenES_Search_SourceHandler_ElasticSearch->insertIntoIndex('post', xxxx, 'test', 'test', xxxx, xxx, xxx, Array)
(...)
#15 library/XenForo/FrontController.php(134): XenForo_FrontController->dispatch(Object(XenForo_RouteMatch))
#16 index.php(13): XenForo_FrontController->run()
#17 {main}

array(3) {
  ["url"] => string(49) "http://www.xxx/forums/xxx/add-thread"
  ["_GET"] => array(1) {
    ["/forums/xxx/add-thread"] => string(0) ""
  }
  ["_POST"] => array(13) {
    ["title"] => string(4) "test"
    ["message_html"] => string(11) "<p>test</p>"
    ["_xfRelativeResolver"] => string(52) "http://www.forums/xxx/create-thread"
    ["attachment_hash"] => string(32) "bc41131f646e122cadf056bc9302ed43"
    ["watch_thread"] => string(1) "1"
    ["watch_thread_state"] => string(1) "1"
    ["discussion_open"] => string(1) "1"
    ["_set"] => array(2) {
      ["discussion_open"] => string(1) "1"
      ["sticky"] => string(1) "1"
    }
    ["poll"] => array(5) {
      ["question"] => string(0) ""
      ["responses"] => array(2) {
        [0] => string(0) ""
        [1] => string(0) ""
      }
      ["max_votes_type"] => string(6) "single"
      ["change_vote"] => string(1) "1"
      ["view_results_unvoted"] => string(1) "1"
    }
    ["_xfToken"] => string(8) "********"
    ["_xfRequestUri"] => string(24) "/forums/xxx/create-thread"
    ["_xfNoRedirect"] => string(1) "1"
    ["_xfResponseType"] => string(4) "json"
  }
}

This is the other one:
Code:
Fehlerinformation
XenForo_Exception: Elasticsearch server returned no response. Is it running? - library/XenES/Search/SourceHandler/ElasticSearch.php:850

#0 library/XenES/Search/SourceHandler/ElasticSearch.php(305): XenES_Search_SourceHandler_ElasticSearch->_logSearchResponseError(false, true)
#1 library/XenForo/Search/SourceHandler/Abstract.php(115): XenES_Search_SourceHandler_ElasticSearch->executeSearch('modified here', false, Array, Array, false, '200')
#2 library/XenForo/Search/Searcher.php(79): XenForo_Search_SourceHandler_Abstract->searchGeneral('modified here', Array, 'relevance', '200')
(...)
#7 index.php(13): XenForo_FrontController->run()
#8 {main}

array(3) {
  ["url"] => string(41) "http://www.xxx/search/search"
  ["_GET"] => array(1) {
    ["/search/search"] => string(0) ""
  }
  ["_POST"] => array(5) {
    ["keywords"] => string(15) "search tearm (modified here)"
    ["users"] => string(0) ""
    ["date"] => string(0) ""
    ["nodes"] => array(1) {
      [0] => string(2) "55"
    }
    ["_xfToken"] => string(8) "********"
  }
}

The Elasticsearch server was not reachable due to an error in my firewall configuration setup. However my bug report is that xenforo should take this as a problem like it is doing with mail server connection errors: in that case the content is still being created instantly.
 
Last edited:
As I said on post #2 when I had the problem of the ES service going down on my server XF was still working fine, I didn't experienced the issue you are reporting here.

The delay *might* be related to an add-on...
 
If it's taking multiple minutes to timeout, then that's actually not respecting the stream timeout that PHP would be using. That is set to 45 by the ES interaction stuff, which in the scope of indexing is certainly long. (It's generally long when it comes to searching, though if it has to load a lot of data off a disk with resource contention, it could be triggered.) At best, we could lower the timeout when on the indexing side of things, but if it's going longer than that, it's a moot point. We may be able to adjust the timing of the indexing operation. Ideally, we may be able to move the actual indexing code to be outside of a transaction, but I don't think that's necessarily something that will be done in the short term.

It sounds like your firewall setup was set to just silently drop packets rather than sending a refused response. This isn't an unreasonable setup in the common case, but it creates a situation where services have to wait as they have no idea if a response has been received, so you end up waiting or potentially retransmitting the packet before eventually giving up. If Elasticsearch actually errors out (eg, isn't running), this is handled gracefully -- it's really the determination of failure taking significant time and causing issues here.

On a side note, if additional applications are being brought in to run XenForo (Elasticsearch, Memcached, etc), they do need to be treated at the same level of importance as your web server and MySQL to ensure they're running healthily as there will be application-level issues if they're not.
 
How was this bug resolved in the end?
If ES has one of its (many) conniptions and all shards fail (hopefully someday it will become a stable product) then it will be resolved by XenForo making an ES error a soft-fail that is logged as a warning or notice but essentially the user isn't impacted. Indexing is soft, it can always be rebuilt but users getting a frozen or white screen is a poor experience for them.
 
Back
Top Bottom