Server issue  Search Index Rebuilding an Epic Fail

  • Thread starter Thread starter Floris
  • Start date Start date
Everything from XF is in InnoDB as far as I know, but phpmyadmin will disclose that obviously.

That said, it is using FULLTEXT by MySQL, but then they're using their internal system to do smarter search.
It's the XenForo indexing of the posts that's killing it.

I completely gave up doing it for big-board converts I am doing. They tell me they gave up after 2 hours and can't afford the downtime financially and users are just out of luck. After few hours it's killing the servers and they haven't even done a million posts, imagine having to do 4 or 40 million.
 
Floris, really there's only so much that can be done from what I can tell - I believe it's mostly down to server configuration. The best approach is to disable the keys, do the build, and then re-enable the keys. Trying to insert hundreds of thousands of rows into an FT index is always going to be an issue, but there's really nothing we can do about that.

If you have 4 million posts, MySQL FT isn't even going to be viable anyway. (I'd consider about a million to be the cap.)
 
So what you're telling me is to advice big boards to not consider xenforo as they will end up with no viable search engine, and users complaining they can't find their posts in their profile or in search. And that they can't rebuild search index unless they get a professional to customize the database to do the rebuild, or at least to spend time tweaking and trying.

This is a XenForo issue.

What you recommend I do with my site, revert back to vBulletin so I can use FULLTEXT and give the users access to their content or tell the users to stfu and point blame at me being too poor to invest in an even bigger setup with faster hard drives, more cpu cores and memory so I can run the one time rebuild? Or just tell them our board is too big to run properly on the xenforo software?

I am sure you will tell me now my reaction is childish.

Or seeing how previous moderating is done on this site: it will most likely disappear from the pool.

But the reality is that I just have 750k posts to rebuild and a lot of big customers you're aiming for that run IPB or vB right now will find this issue to be an issue as well.

And that's simply a fact. I can't even get passed 1 million on the bigger boards, and these guys have 20 million posts, 4 million posts, 12 million and one site I quit a few weeks ago had 43 million.

Thankfully 3 of them are still testing and not live.
 
Floris, really there's only so much that can be done from what I can tell - I believe it's mostly down to server configuration. The best approach is to disable the keys, do the build, and then re-enable the keys. Trying to insert hundreds of thousands of rows into an FT index is always going to be an issue, but there's really nothing we can do about that.

If you have 4 million posts, MySQL FT isn't even going to be viable anyway. (I'd consider about a million to be the cap.)
ok then let's not use the FT engine then...it's obviously a barrier to growth in the product. Your statement above does not jive. It's not a server limitation if MySQL FT engine is capped at about 1 mil. If that's the case you need to use another engine. I'm assuming the FT engine has to do with search? If that's the case using FT isn't a "smarter" choice. How about using sphinx? Unless you aren't interested in the "larger forums" market right now?
 
Replacing it from Fixed to Server Issue is an insult and I find it childish and unprofessional. I've shown clearly that I have tried this on various setups, home system (2 cpus, 4 cores, 12gb ram, 7tb sata2 32mb cache drives, ssd boot drive, and dedicated ram drive), vps (2 cpus, quad cores, 15k rpm drive and 4gb ram), dedicated (xeon quad core, 2gb ram, 250gb sata2), and even big ass web sites that have 2500+ online users and millions of posts and an array of servers dual quad core cpus with 8gb or more on raid setups). So far everywhere I try where it's over 500k it simply doesn't work.

Server issue ..

Thanks for trying to improve your unusable feature. A showstopper
 
So what you're telling me is to advice big boards to not consider xenforo as they will end up with no viable search engine, and users complaining they can't find their posts in their profile or in search. And that they can't rebuild search index unless they get a professional to customize the database to do the rebuild, or at least to spend time tweaking and trying.

This is a XenForo issue.

What you recommend I do with my site, revert back to vBulletin so I can use FULLTEXT and give the users access to their content or tell the users to stfu and point blame at me being too poor to invest in an even bigger setup with faster hard drives, more cpu cores and memory so I can run the one time rebuild? Or just tell them our board is too big to run properly on the xenforo software?

I am sure you will tell me now my reaction is childish.

Or seeing how previous moderating is done on this site: it will most likely disappear from the pool.

But the reality is that I just have 750k posts to rebuild and a lot of big customers you're aiming for that run IPB or vB right now will find this issue to be an issue as well.

And that's simply a fact. I can't even get passed 1 million on the bigger boards, and these guys have 20 million posts, 4 million posts, 12 million and one site I quit a few weeks ago had 43 million.

Thankfully 3 of them are still testing and not live.

think sphinx might work in this instance?
 
Replacing it from Fixed to Server Issue is an insult and I find it childish and unprofessional. I've shown clearly that I have tried this on various setups, home system, vps, dedicated, and even big ass web sites that have 2500+ online users and millions of posts and an array of servers. So far everywhere I try where it's over 500k it simply doesn't work.

Server issue ..

Thanks for trying to improve your unusable feature. A showstopper

this is not a server issue. By their own admission relying on MySQL FT is a limitation on scalability. There are ways to code around it..like use sphinx or a different search engine..or maybe using postgres later on. I would make this won't fix not server issue....unless the devs can provide proof of 2 mil plus boards converting successfully with full configuration details posted. In that case it very well is a server operator issue..right now the statements of the devs don't jive with this label.
 
think sphinx might work in this instance?

FT works just fine once indexed, using the working feature isn't the issue, it's the rebuilding of large amount of data that is.

There is no need to invest time/resources in third party unofficial sphinx setup just because a one time feature doesn't work.
 
this is not a server issue. By their own admission relying on MySQL FT is a limitation on scalability. There are ways to code around it..like use sphinx or a different search engine..or maybe using postgres later on. I would make this won't fix not server issue....unless the devs can provide proof of 2 mil plus boards converting successfully with full configuration details posted. In that case it very well is a server operator issue..right now the statements of the devs don't jive with this label.

They can of course proof this. Even by just removing keys and rebuilding and hopefully getting through it and returning keys or a board that had 400k posts, converted, and now has 600k posts since ..

There's nothing wrong with the default or customized or optimized setups that I've tried.
 
I'm simply saying run: ALTER TABLE xf_search_index DISABLE KEYS; before you run the rebuild. If it completes successfully, you can then run ALTER TABLE xf_search_index ENABLE KEYS; This is really a standard thing for improving bulk insert speed in MySQL, but it requires a bit of a human touch -- basically because of how the web works -- so it's not something we could drop in automatically. The issue isn't strictly the size of the data, it's trying to do all the inserts together, with MySQL modifying the index after every insert (which beta 6 folds together). For reference, I do remember someone who had a 1+ million post board who had no problems with the rebuild (and actually commented on how fast it ran), so the high load is not necessarily intrinsic.

Floris, I'm pretty sure I have offered, but I am willing to login to your server to try to see what can be done to improve it or get debugging info.

ok then let's not use the FT engine then...it's obviously a barrier to growth in the product. Your statement above does not jive. It's not a server limitation if MySQL FT engine is capped at about 1 mil. If that's the case you need to use another engine. I'm assuming the FT engine has to do with search? If that's the case using FT isn't a "smarter" choice. How about using sphinx? Unless you aren't interested in the "larger forums" market right now?
Really, the Sphinx stuff isn't particularly relevant to this discussion, as it is specifically about MySQL FT; the load is clearly caused by MySQL, which would only negligibly being involved with Sphinx. Given an option, I wouldn't recommend MySQL FT search - it's really not particularly good, but Sphinx isn't feasible for most people to even access (it needs a daemon to run).
 
FT works just fine once indexed, using the working feature isn't the issue, it's the rebuilding of large amount of data that is.

There is no need to invest time/resources in third party unofficial sphinx setup just because a one time feature doesn't work.
well considering the devs stated MYSQL FT isn't viable past 1 mil I would label that as an XF problem since they are obviously aware of this limitation. This also means if you have a DB problem with the index and need to rebuild you won't have search for hours..days..longer?... because you effectively can't rebuild the indexes without. If you can't get the index to rebuild you have no search. Since the FT search has an apparent performance cap @ 1mil then it's time to use another engine...one that can handle large search indexes. I would look into how either phpbb does it(or another product for reference)..or start integrating sphinx.
 
I'm simply saying run: ALTER TABLE xf_search_index DISABLE KEYS; before you run the rebuild. If it completes successfully, you can then run ALTER TABLE xf_search_index ENABLE KEYS; This is really a standard thing for improving bulk insert speed in MySQL, but it requires a bit of a human touch -- basically because of how the web works -- so it's not something we could drop in automatically. The issue isn't strictly the size of the data, it's trying to do all the inserts together, with MySQL modifying the index after every insert (which beta 6 folds together). For reference, I do remember someone who had a 1+ million post board who had no problems with the rebuild (and actually commented on how fast it ran), so the high load is not necessarily intrinsic.

Floris, I'm pretty sure I have offered, but I am willing to login to your server to try to see what can be done to improve it or get debugging info.


Really, the Sphinx stuff isn't particularly relevant to this discussion, as it is specifically about MySQL FT; the load is clearly caused by MySQL, which would only negligibly being involved with Sphinx. Given an option, I wouldn't recommend MySQL FT search - it's really not particularly good, but Sphinx isn't feasible for most people to even access (it needs a daemon to run).
actually it is. by your own admission the MySQL Ft engine isn't all the good and causes issues. If that's the case it's tome to use another search engine..which means something else. Sphinx is perfectly relevant.
 
Not being able to index old data means a user has a thread from last year called

'my favorite soup'

and then you go to their profile and click on 'threads created' and the soup thread isn't listed.
or you go to search dropdown and search for threads started with soup, and you find nothing.

Users get upset, leave the site, or complain tha their content is deleted, etc, not understanding the situation.

newly created content automatically gets indexed properly.

Currently I can't find 500,000+ posts on my site, or 3 million on the other web site.

Yes, you can go to the forum, and browse through 195 paginated pages to find that thread .. it's actually there as content obviously.
 
I'm simply saying run: ALTER TABLE xf_search_index DISABLE KEYS; before you run the rebuild. If it completes successfully, you can then run ALTER TABLE xf_search_index ENABLE KEYS; This is really a standard thing for improving bulk insert speed in MySQL, but it requires a bit of a human touch -- basically because of how the web works -- so it's not something we could drop in automatically. The issue isn't strictly the size of the data, it's trying to do all the inserts together, with MySQL modifying the index after every insert (which beta 6 folds together). For reference, I do remember someone who had a 1+ million post board who had no problems with the rebuild (and actually commented on how fast it ran), so the high load is not necessarily intrinsic.
.

if this is what's required..how about making that command part of the rebuild process scripts? Also could you link to the thread about the fast rebuild? I'm unable to locate it..no snarky comments intended. I want to check that persons server configuration..:)
 
actually it is. by your own admission the MySQL Ft engine isn't all the good and causes issues. If that's the case it's tome to use another search engine..which means something else. Sphinx is perfectly relevant.
When the discussion is about rebuilding the MySQL FT index, suggesting Sphinx isn't really relevant. :) Regardless, as I said, if you're on a shared server, you really have no other option than what's there.

if this is what's required..how about making that command part of the rebuild process scripts? Also could you link to the thread about the fast rebuild? I'm unable to locate it..no snarky comments intended. I want to check that persons server configuration..:)
It needs a human touch because it's a problem if it doesn't get re-enabled. The person needs to be aware that it happened. We could add the query, but if the index didn't finish for some reason, the search system would be left in a bad state.

Also, the thread: http://xenforo.com/community/thread...forum-with-2-8-million-posts.7003/#post-99869
 
So it seems you are going to need either loads of ram to hold the hugely expanded search index(which requires significant mysql tweaks) or lots of fast mechanical disks or a couple of ssd's(raid 1) for fast access to the index.
 
When the discussion is about rebuilding the MySQL FT index, suggesting Sphinx isn't really relevant. :) Regardless, as I said, if you're on a shared server, you really have no other option than what's there.


It needs a human touch because it's a problem if it doesn't get re-enabled. The person needs to be aware that it happened. We could add the query, but if the index didn't finish for some reason, the search system would be left in a bad state.

Also, the thread: http://xenforo.com/community/thread...forum-with-2-8-million-posts.7003/#post-99869


So to use XenForo on a big board, so ppl can search for their many years of created content:

"please consider that this was done with a server with a very good disk system (8x SAS RAID10) which is a major factor."

^ copied from that thread you linked to .

I doubt every admin with 1 million posts or more will have a 8x SAS raid10 setup
 
Floris, I really don't get why you are so upset? Why not simply accept Mikes kind offer?
Floris, I'm pretty sure I have offered, but I am willing to login to your server to try to see what can be done to improve it or get debugging info.
 
Floris, I really don't get why you are so upset? Why not simply accept Mikes kind offer?

After spending hours on this with every release since the first alpha ... frustration starts to creep in.

Additionally I can't grant Mike access to the databases of people who I help convert. Who are asking for refunds because I can't complete the convert. Also .. quite frustrating.

I was hoping to spend my Saturday differently, instead I spend more than my eyes allow on this stupid issue that gets flagged as 'server issue' or 'fixed'.

Thanks Mike for the offer, when I am in a better mood I will consider it.
 
Top Bottom