XF 1.3 Slow queries around xf_search_index

Hi guys,

We are seeing our db spike to 100% CPU usage occasionally and using MySql workbench I see a lot of queries that are essentially searches for content or content created by users (via xenforo/library/XenForo/Search/SourceHandler/MySqlFt.php). I'm not sure if this is what's causing the issue but the queries frequently have the status of "waiting for table lock" and running one of the example queries took 50 seconds to return a result.

Is this an obvious case for https://xenforo.com/help/enhanced-search? I believe we had it before but it stopped working and then we migrated to 1.3 (with new engineers) so any knowledge of that is probably gone. If it is something we need to do, can we do something to help mitigate the slowness until we migrate? Disabling search is obviously an extreme measure but anything else from the db side?
 
Wow, how were you guys ever managing with MySQL full text search? Generally speaking, you'll start to see potential performance issues with around 1 million records. (I suppose if you only indexed the stuff after the upgrade...)

XF Enhanced Search (Elasticsearch-based) would pretty much be a must unless you guys plan on killing search entirely (though there are a couple functions that use the data that aren't directly query-based searches).
 
Bear in mind that Enhanced Search is really just a wrapper to interact with Elasticsearch. Elasticsearch will need to be setup and have shards setup and memory allocated to actually index all of your documents. If you want to index 100 million posts, the index will need to be spread across a number of boxes and quite a lot of memory will need to be allocated, though it's hard to say exactly how much. In terms of pure index size, anecdotally, I'm seeing around 300MB per million documents as listed in the XF ACP. That may not be the amount of memory Elasticsearch actually wants though.
 
Bear in mind that Enhanced Search is really just a wrapper to interact with Elasticsearch. Elasticsearch will need to be setup and have shards setup and memory allocated to actually index all of your documents. If you want to index 100 million posts, the index will need to be spread across a number of boxes and quite a lot of memory will need to be allocated, though it's hard to say exactly how much. In terms of pure index size, anecdotally, I'm seeing around 300MB per million documents as listed in the XF ACP. That may not be the amount of memory Elasticsearch actually wants though.

You can actually get away with vastly less ram if you are using SSDs. I've got a 3 node cluster with total memory of 1.5 gb of allocated ram for 16.1 million posts, rather than the suggested 4.8gb and performance is excellent.

Thanks for the info guys. I don't think we have anyone with extensive knowledge of Elasticsearch so I might be back with more questions when we implement it.
 
  • Like
Reactions: Xon
@Mike We have an ES cluster set up and we are now preparing to do a production deploy. Here's our big issue: We took a snapshot of our production db and then built the indices of the ES cluster in a staging environment based off of that production snapshot.

We were hoping to now just install the Enhanced Search plugin on production and then point it at the ES cluster created on staging. Will this cause issues? We were hoping to avoid the day and a half it takes to build the indices in production by doing it this way but now I'm concerned that whatever handles adding new indices to ES (I'm assuming cron job) will fail because it won't know where it left off since we've never technically built the indices on ES in the production environment.

Any thoughts?
 
The search system updates the index when content is changed/created. If your staging setup is a copy of production with all the same data, then you can just point your production DB at that cluster (ensure you set the same index name as it will default to the DB name) and it should work. That said, you won't have any content that has been created and is only in the production DB or any older content edits. The only way to handle that would be to reindex. That said, you could do that without dropping the old index and it should allow the index to be updated with the new data seamlessly.
 
That said, you won't have any content that has been created and is only in the production DB or any older content edits. The only way to handle that would be to reindex. That said, you could do that without dropping the old index and it should allow the index to be updated with the new data seamlessly.

We assumed this we be the case. If we reindex manually will it try to reindex all of the existing posts/content or just what isn't currently in ElasticSearch?
 
It will reindex everything. You don't have to choose the option to delete the index though; it should just merge the changes/additions in.
 
Back
Top Bottom