Cache Rebuild Error - ES Stopped Working, Again

Anthony Parsons

Well-known member
ES has been having issues on and off for the past few days, even though ES is running at the server every single time checked, now ES has completely stopped working.

The server software has not updated, XF has not been updated... yet all sites suddenly have issues with ES running. At present I've had to revert search back to mysql on my main sites, though using a small forum to play around with and test ES. So far, no avail to fixing this issue and any help is certainly welcome.

I have read through many threads and posts here about this same issue, tried many of the suggestions with adding limits, configurations, etc... still no avail.

Every time I go to rebuild the cache it gives the error: No response returned from Elasticsearch. Is it running?

A server error on a smaller site gives:

PHP:
Error Info
 
XenForo_Exception: Elasticsearch server returned no response. Is it running? Elasticsearch indexing failed for post- - library/XenES/Search/SourceHandler/ElasticSearch.php:721
Generated By: Anthony, 17 minutes ago
 
Stack Trace
 
#0 /home/mycombat/public_html/library/XenES/Search/SourceHandler/ElasticSearch.php(748): XenES_Search_SourceHandler_ElasticSearch->_logSearchResponseError(false, true, 'Elasticsearch i...')
#1 /home/mycombat/public_html/library/XenES/Search/SourceHandler/ElasticSearch.php(67): XenES_Search_SourceHandler_ElasticSearch->_assertIndexSuccessful(false, 'post')
#2 /home/mycombat/public_html/library/XenForo/Search/Indexer.php(125): XenES_Search_SourceHandler_ElasticSearch->finalizeRebuildSet()
#3 /home/mycombat/public_html/library/XenForo/CacheRebuilder/SearchIndex.php(93): XenForo_Search_Indexer->finalizeRebuildSet()
#4 /home/mycombat/public_html/library/XenForo/ControllerHelper/CacheRebuild.php(26): XenForo_CacheRebuilder_SearchIndex->rebuild(0, Array, NULL)
#5 /home/mycombat/public_html/library/XenForo/ControllerAdmin/Tools.php(78): XenForo_ControllerHelper_CacheRebuild->rebuildCache(Array, 'http://www.myco...', 'admin.php?tools...', true)
#6 /home/mycombat/public_html/library/XenForo/FrontController.php(310): XenForo_ControllerAdmin_Tools->actionCacheRebuild()
#7 /home/mycombat/public_html/library/XenForo/FrontController.php(132): XenForo_FrontController->dispatch(Object(XenForo_RouteMatch))
#8 /home/mycombat/public_html/admin.php(13): XenForo_FrontController->run()
#9 {main}
 
Request State
 
array(3) {
  ["url"] => string(57) "http://www.mycombatptsd.com/admin.php?tools/cache-rebuild"
  ["_GET"] => array(1) {
    ["tools/cache-rebuild"] => string(0) ""
  }
  ["_POST"] => array(5) {
    ["process"] => string(1) "1"
    ["caches"] => string(63) "[["SearchIndex",{"content_type":"","batch":"500","delay":"5"}]]"
    ["position"] => string(1) "0"
    ["redirect"] => string(51) "http://www.mycombatptsd.com/admin.php?tools/rebuild"
    ["_xfToken"] => string(53) "1,1334789763,1581f0874c143b78b236b8d0ea26fdb852d380e7"
  }
}

I have no idea what is wrong with this thing or why suddenly now.

Anyone with ideas would be greatly appreciated.

I have restarted it over and over, it is running from the server.

Screen Shot 2012-04-19 at 9.09.49 AM.webp
 
It has been getting a lot of this it seems recently:

[2012-04-17 08:49:15,519][WARN ][netty.channel.socket.nio.NioServerSocketPipelineSink] Failed to accept a connection.
java.io.IOException: Too many open files
at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:163)
at org.elasticsearch.common.netty.channel.socket.nio.NioServerSocketPipelineSink$Boss.run(NioServerSocketPipelineSink.java:244)
at org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:44)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)
[2012-04-17 08:49:16,031][WARN ][index.shard.service ] [Shiver Man] [mycombat_***][4] Failed to perform scheduled engine refresh
org.elasticsearch.index.engine.RefreshFailedEngineException: [mycombat_***][4] Refresh failed
at org.elasticsearch.index.engine.robin.RobinEngine.refresh(RobinEngine.java:789)
at org.elasticsearch.index.shard.service.InternalIndexShard.refresh(InternalIndexShard.java:412)
at org.elasticsearch.index.shard.service.InternalIndexShard$EngineRefresher$1.run(InternalIndexShard.java:699)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)
Caused by: java.io.FileNotFoundException: /var/elasticsearch/elasticsearch/nodes/0/indices/mycombat_***/4/index/_rb.prx (Too many open files)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.<init>(RandomAccessFile.java:233)
at org.apache.lucene.store.FSDirectory$FSIndexOutput.<init>(FSDirectory.java:441)
at org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:306)
at org.elasticsearch.index.store.Store$StoreDirectory.createOutput(Store.java:416)
at org.elasticsearch.index.store.Store$StoreDirectory.createOutput(Store.java:388)
at org.apache.lucene.index.FormatPostingsPositionsWriter.<init>(FormatPostingsPositionsWriter.java:43)
at org.apache.lucene.index.FormatPostingsDocsWriter.<init>(FormatPostingsDocsWriter.java:57)
at org.apache.lucene.index.FormatPostingsTermsWriter.<init>(FormatPostingsTermsWriter.java:33)
at org.apache.lucene.index.FormatPostingsFieldsWriter.<init>(FormatPostingsFieldsWriter.java:51)
at org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:85)
at org.apache.lucene.index.TermsHash.flush(TermsHash.java:113)
at org.apache.lucene.index.DocInverter.flush(DocInverter.java:70)
at org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:60)
at org.apache.lucene.index.DocumentsWriter.flush(DocumentsWriter.java:581)
at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3623)
at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3588)
at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:452)
at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:401)
at org.apache.lucene.index.DirectoryReader.doOpenFromWriter(DirectoryReader.java:428)
at org.apache.lucene.index.DirectoryReader.doOpenIfChanged(DirectoryReader.java:448)
at org.apache.lucene.index.DirectoryReader.doOpenIfChanged(DirectoryReader.java:396)
at org.apache.lucene.index.IndexReader.openIfChanged(IndexReader.java:520)
at org.elasticsearch.index.engine.robin.RobinEngine.refresh(RobinEngine.java:764)
... 5 more
 
These stand out:

Rich (BB code):
[2012-04-17 08:49:15,519][WARN ][netty.channel.socket.nio.NioServerSocketPipelineSink] Failed to accept a connection.
java.io.IOException: Too many open files
Caused by: java.io.FileNotFoundException: /var/elasticsearch/elasticsearch/nodes/0/indices/mycombat_***/4/index/_rb.prx (Too many open files)

Unless they are normal for ES?
 
I have an idea what it may be, unfortunately its late and im about to go to bed, but it may be that the transportclient is hanging on a corrupt ES node.

For now I would suggest (if you haven't already) completely killing the ES and Java PID's.

Ensure the raised file limits are correctly set for your java/es users.

Updating both java and es, clearing your current indexes and re-indexing everything.
 
Yep, Slavik fixed this up for me, whom I am extremely grateful for his very fast assistance. Wiredtree are good, but their techs have pretty much no real experience or knowledge with this specific software to provide much assistance with... so again, thank you very much Slavik for your help to rectify this on my sites.
 
What did you do to resolve this?

I had a site throwing similar errors yesterday. Luckily it was a test site so no big deal. BUT at the same time it was saying ES wasn't running, I was able to re cache a 1 million post board on the same server.

I ended up removing the ES folder, installing a new copy of the latest ES, and reindex. Issues gone.
 
What did you do to resolve this?

I had a site throwing similar errors yesterday. Luckily it was a test site so no big deal. BUT at the same time it was saying ES wasn't running, I was able to re cache a 1 million post board on the same server.

I ended up removing the ES folder, installing a new copy of the latest ES, and reindex. Issues gone.

Pretty much what you've done, clear down the old node and re-create it.
 
Top Bottom