Fixed 1.3 Too many users causes forum_list to crash

Jim Boy

Well-known member
We've found out, the hard way, that in 1.3 forum_list cannot handle large number of users, unless we set our script memory limit to a higher figure.

In debug mode forum_list was consuming 140MB of RAM, even after turning off any suspect add-ons and all widgits. It looks as if the culprit is XenForo_Model_Session::prepareSessionActivityFetchOptions(). Changing the amount of time that a user is online to one minute alleviated the issue and dramatically reduced memory usage. SQL call is below

Code:
[LIST=1]
[*]SELECT session_activity.*
,
user.*,
user_profile.*,
user_option.*
FROM xf_session_activity AS session_activity

LEFT JOIN xf_user AS user ON
(user.user_id = session_activity.user_id)
LEFT JOIN xf_user_profile AS user_profile ON
(user_profile.user_id = user.user_id)
LEFT JOIN xf_user_option AS user_option ON
(user_option.user_id = user.user_id)
WHERE (session_activity.view_date > 1411390264)
ORDER BY session_activity.view_date DESC
Run Time: 0.078202
Select TypeTableTypePossible KeysKeyKey LenRefRowsExtra
SIMPLEsession_activityrangeview_dateview_date4 9105Using where
SIMPLEusereq_refPRIMARYPRIMARY4bigfooty.session_activity.user_id1 
SIMPLEuser_profileeq_refPRIMARYPRIMARY4bigfooty.user.user_id1 
SIMPLEuser_optioneq_refPRIMARYPRIMARY4bigfooty.user.user_id1 
[*]
[/LIST]

Without any sort of limiting of the data retrieved, this will remain an issue. Of course it only happens when we are super busy, but that is exactly the time when we dont want it happening.

We will be migrating to 1.4 in about a month or so, depending on the stability of add-ons we require
 
Debug mode isn't intended to be used on a live forum. By using it, you commit to use a a lot memory and time to build the page due to extra processes and memory storage requirements of debug mode.
 
We have an administration box which isn't accessed by forum users, which allows us to switch debug mode on or off without affecting the general site. That admin box allows for bigger memory usage and has much greater timeout tolerancy and does all of the deferred work. Anyway switching the online timeout to 1 minute literally knocked off 100MB from memory used.
 
I can experiment with cutting out the profile/option tables which would reduce the memory usage (and am looking at that for 1.4), but a more dramatic change would have to wait until version 2 I think (it would probably necessitate taking what is now 1 query that overfetches and turning it into up to 3 queries).
 
I wonder what owners of the big-forums will do in this situation ??

They will use appropriate server hardware and configurations for the resource intensive operations they are conducting. Quite a few boards with several thousand users online at one time. The OP even stated the fix, however it's just throwing more resources at a beast that might be optimized in the future.
 
I'm surprised people still run their forums on hardware.

You're getting almost the entire profile of every user who has done anything from some point in time and beyond, I'm guessing now minus the online time. The query returned 9105 rows above. How many did it return with the 1 minute online time?
 
How many did it return with the 1 minute online time?

Should be an easy answer. Anyone whose last session activity was within a minute. The rows are still there. If you have 9105 people online in the last hour, 1000 of those in the last 15 minutes and 100 of them online in the last minute your only going to return 100 rows while there is 9105 in the table from the last hour to hour and a half.

The table is emptied on a regular basis. Ive never seen anyone in ours past 1.5 hours.
 
The query returned 9105 rows above. How many did it return with the 1 minute online time?

From recollection it was around 3000, although I think the highest peak minute would have been significantly higher. It really does depend on external factors - you cannot make a simplistic division. On this particular occasion, poeple were following the event on tv, as soon as an ad break hit, there is a huge spike as people jump on the site. Roughly its safe to keep the returned rows below 6000 if you have a script limit of 128MB.
 
From recollection it was around 3000, although I think the highest peak minute would have been significantly higher. It really does depend on external factors - you cannot make a simplistic division. On this particular occasion, poeple were following the event on tv, as soon as an ad break hit, there is a huge spike as people jump on the site. Roughly its safe to keep the returned rows below 6000 if you have a script limit of 128MB.

Awesome man, thanks for the update. It's something I'll watch for.
 
The approach has changed pretty significantly for this area of code in 2.0, so performance issues with this should generally be resolved there.
 
Top Bottom