XF 1.3 Elasticsearch + Locked DB Strange Errors

Discussion in 'XenForo Questions and Support' started by Wesker, Apr 7, 2016.

  1. Wesker

    Wesker

    Please note one of the well known custom development 3rd party teams at xenForo have requested we post a ticket here as they are stuck on this issue.

    Earlier today, we had an issue with a disk drive reaching it's max causing the server to go briefly down before we truncated the DB disk drive.

    Since the server has been restored, everything appears to be running fine except the server error log is going crazy with errors.

    Error 1: Elasticsearch

    Elasticsearch is installed and appears to be running

    ES running
    [root@server config]# curl
    "status" : 200,
    "name" : "Arclight",
    "cluster_name" : "elasticsearch",
    "version" : {
    "number" : "1.4.5",
    "build_hash" : "2aaf797f2a571dcb779a3b61180afe8390ab61f9",
    "build_timestamp" : "2015-04-27T08:06:06Z",
    "build_snapshot" : false,
    "lucene_version" : "4.10.4"
    "tagline" : "You Know, for Search"

    However it's being bombarded with pages and pages of errors

    XenForo_Exception: Elasticsearch server returned no response. Is it running? Elasticsearch indexing failed - library/XenES/Search/SourceHandler/ElasticSearch.php:833

    After this 3rd party team reviewed the

    Error 2: Mysqli statement execute error : Lock wait timeout exceeded; try restarting transaction

    The next issue is the db is being bombarded with the same sql issues over and over. They have informed me the server looks fine.

    [2016-04-07 14:49:26,269][DEBUG][action.index ] [Arclight] observer: timeout notification from cluster service. timeout setting [1m], time since start [1m]

    Also note I'm still use 1.3 (will upgrade to 1.5) once one more mod has been fixed to support 1.5.
  2. Wesker

    Wesker

    Members having a difficult time posting new threads and posts. Slow and locks up is the biggest problem.
  3. Mike

    Mike

    Both of these make me think there's significant server load. What are the load averages?

    Is there anything in Elasticsearch's log?

    In terms of MySQL, what is showing in the process list (SHOW FULL PROCESSLIST; ) ? Running queries? Long running connections? How about InnoDB (SHOW INNODB STATUS; )

    If this all happened due to a disk issue, then the side effects of that are likely the issue. This could be, for example, data corruption in either location, which may be causing various issues as a knock on.
  4. Wesker

    Wesker

    Will check on this in a moment. It doesn't appear we're being DDOS. The server is running fine it's just processing db queries.

    Elasticsearch log = /admin.php?brsql-log/ correct?

    If so yes many searches. We can still load elasticsearch but errors everywhere so I assume others maybe getting errors.

    Long running connections. Let me check on the rest for you.

    This is probably our best lead. What do you recommend we do here so we can quickly check this?
  5. Wesker

    Wesker

    By the way Mike I just want to say thank you for replying. You guys do a great job support wise.
  6. MattW

    MattW

    Try /var/log/elasticsearch/ on the server itself, not via the XenForo ACP.
  7. Wesker

    Wesker

    [root@server ~]# /var/log/elasticsearch/
    -bash: /var/log/elasticsearch/: is a directory
  8. MattW

    MattW

    That's the directory where the logs are stored
  9. Wesker

    Wesker

    Okay waiting for more logs to compile here. File was accidentally removed. Will send that in a bit.

    Main issue though is the db locks. It could all be connected to one thing.
  10. Wesker

    Wesker

  11. Wesker

    Wesker

    1). Still no luck so far on the lock/deadlock in mySQL
    2). I have yet to experience any issues with the search. We're getting errors but everything seems to be running smoothly with it.

    #1 obviously the biggest issue.
  12. Wesker

    Wesker

    The server load is normal.
  13. Wesker

    Wesker

    Where an I send the inno status and process list to you guys?
  14. Mike

    Mike

    You can send it in a conversation.

    You'll also want to look at any MySQL logs, particularly from any point after the disk issue. They'll be in /var/log most likely.
  15. Wesker

    Wesker

    • Sent list to your conversation inbox
    • Notified 3rd party team about reviewing sql logs
  16. Wesker

    Wesker

    Per developers:

    "We have already looked into the sql logs as requested, but nothing there was helpful in resolving the issue."
  17. Wesker

    Wesker

    Just to confirm there is no data corruption.
  18. Wesker

    Wesker

    Is there anyway I can pay a fee to get this expedited by the xF team?
  19. Mike

    Mike

    This isn't really something that's officially covered by our support (server config). The fact that everything happened after a hardware/disk issue rather than any change to XenForo certainly points to that as the underlying problem. I don't know what knock on effects it could have had to the data stores of Elasticsearch and MySQL, or even to the general functioning of the server.

    Based on the logs you sent me, everything is just running very slow. I see evidence of queries waiting 30 seconds for a particular lock to be released. It's a trivial query itself, though I can't necessarily know what else the potential query is doing. There are a couple transactions running over 30 seconds (without showing them waiting in MySQL). The vast majority of XF pages would be under a second in normal operation.

    Unfortunately, I don't have a great recommendation. It's worth pointing out that in general, the issues seem to be with writes. They're slow in MySQL and your Elasticsearch error was failed indexing. That might even point to something with the underlying file system. You could try restarting the services, maybe even wiping your Elasticsearch index and rebuilding it (you could do the similar by dumping and restoring the MySQL DB as well), though I'm not convinced that would help if there's not data corruption.

    It certainly won't solve the issue, but you probably want to turn off your add-on that shows the online user status in posts. It will exacerbate the issues a bit.
  20. Wesker

    Wesker

    This is resolved now.

