XF 2.1 Death From Ten Million Reactions

Lum

Member
I run Broken Forum, a general-discussion board mostly focused on computer gaming. It's been running for about 8 years now, and I just recently upgraded to XF 2.1.

We're now seeing daily cleanup times in the 2-3 hour range locking users out of the board, and I strongly suspect it's due to migrating the old like system to the new reaction system.

Our users like likes. A lot. To the point that in 8 years, we've logged close to 11 million likes. I'm fairly certain that this is an extreme edge case for this system.

If necessary, there's no real reason to store 10 million likes, and I don't think we'd lose much by ditching the first 3 years' worth or so to lighten the load - is there a good way to do this? Or is there another solution? (I'm currently running on Amazon Lightsail and bumped the dedicated database up to 2 Gb RAM from 1 Gb to see if that helps in the meanwhile.)
 
I think we need to explore this in a little bit more detail because I'm not 100% clear on what you are seeing, exactly.
We're now seeing daily cleanup times in the 2-3 hour range
We do have a daily clean up cron entry. This doesn't touch anything to do with reactions (neither did it touch anything to do with likes). We do also have an hourly clean up cron entry. This doesn't touch any of that either.

So I'm trying to understand the correlation between what you have said, how you have ascertained it is the daily cleanup, and what has led you to believe it is related to likes/reactions.

It's also worth noting that fundamentally reactions aren't actually a great deal different from likes. The system certainly looks and behaves a lot differently, but at its most basic level, all we've done is add an additional column to the table where we store reactions to indicate which reaction was given.

Also, what actually happens when users are "locked out"? Do they just experience slow loading times? Or are errors displayed to them? Are the issues happening at the same time every day?

Do you have any slow query logging enabled on the server and are any slow queries being logged around the time this issue is going on?
 
It was just a guess based on the time reported for the extended downtimes lining up with the daily cleanup cron job (they report 2-3 hours of the server coming back with a nginx 503 Temporarily Unavailable error). I've turned on slow query logging on my server (it's not on by default) so should have some more feedback for you in a bit.
 
Back
Top Bottom