XF 1.4 Slow queries on very large forums

nrep

Well-known member
I'm testing out upgrading a forum with several million threads, which contains many forum categories with hundreds of thousands of posts.

I did a test XF import and it works really well (and fast) on smaller forum sections, however if I start browsing a deep forum section page (i.e. page 10,000 of a forum), then things crawl to a halt. The further back I go, the worse it gets. I can use up to page 2,000 without much of a delay, but if I start to access page 10,000+ I get timeouts (over 30 seconds).

Here's an example of one of the slow queries I get from debug mode:

Code:
SELECT thread.*
    ,
    user.*, IF(user.username IS NULL, thread.username, user.username) AS username,
    deletion_log.delete_date, deletion_log.delete_reason,
    deletion_log.delete_user_id, deletion_log.delete_username,
        IF(thread_read.thread_read_date > 1420400047, thread_read.thread_read_date, 1420400047) AS thread_read_date,
    IF(thread_watch.user_id IS NULL, 0,
        IF(thread_watch.email_subscribe, 'watch_email', 'watch_no_email')) AS thread_is_watched,
    thread_user_post.post_count AS user_post_count
FROM xf_thread AS thread

    LEFT JOIN xf_user AS user ON
        (user.user_id = thread.user_id)
    LEFT JOIN xf_deletion_log AS deletion_log ON
        (deletion_log.content_type = 'thread' AND deletion_log.content_id = thread.thread_id)
    LEFT JOIN xf_thread_read AS thread_read ON
        (thread_read.thread_id = thread.thread_id
        AND thread_read.user_id = 1)
    LEFT JOIN xf_thread_watch AS thread_watch
        ON (thread_watch.thread_id = thread.thread_id
        AND thread_watch.user_id = 1)
    LEFT JOIN xf_thread_user_post AS thread_user_post
        ON (thread_user_post.thread_id = thread.thread_id
        AND thread_user_post.user_id = 1)
WHERE (thread.node_id = 152) AND (thread.sticky = 0) AND (thread.discussion_state IN ('visible','deleted','moderated'))
ORDER BY thread.last_post_date DESC
LIMIT 20 OFFSET 162380
Run Time: 12.230421

Select Type    Table    Type    Possible Keys    Key    Key Len    Ref    Rows    Extra      
SIMPLE    thread    ref    node_id_last_post_date,node_id_sticky_state_last_post    node_id_last_post_date    4    const    552480    Using where      
SIMPLE    user    eq_ref    PRIMARY    PRIMARY    4    testsite.thread.user_id    1          
SIMPLE    deletion_log    eq_ref    PRIMARY    PRIMARY    31    const,testsite.thread.thread_id    1    Using where      
SIMPLE    thread_read    eq_ref    user_id_thread_id,thread_id    user_id_thread_id    8    const,testsite.thread.thread_id    1          
SIMPLE    thread_watch    eq_ref    PRIMARY,thread_id_email_subscribe    PRIMARY    8    const,testsite.thread.thread_id    1          
SIMPLE    thread_user_post    eq_ref    PRIMARY,user_id    PRIMARY    8    testsite.thread.thread_id,const    1

This forum runs really well on vBulletin, so I'm very puzzled as to why it crawls on XF? I would consider MySQL to be well optimised - other sites on the server fly, as does the VB version of the site.
 
If I browse the forum as a guest, things are faster but this query slows things down. This is probably worse in a way, as spiders will be the ones crawling these pages the most:

Code:
SELECT thread.*
    ,
    user.*, IF(user.username IS NULL, thread.username, user.username) AS username,
    NULL AS thread_read_date,
    0 AS thread_is_watched,
    0 AS user_post_count
FROM xf_thread AS thread

    LEFT JOIN xf_user AS user ON
        (user.user_id = thread.user_id)
WHERE (thread.node_id = 152) AND (thread.sticky = 0) AND (thread.discussion_state IN ('visible'))
ORDER BY thread.last_post_date DESC
LIMIT 20 OFFSET 238340
Run Time: 4.383607
Select Type    Table    Type    Possible Keys    Key    Key Len    Ref    Rows    Extra      
SIMPLE    thread    ref    node_id_last_post_date,node_id_sticky_state_last_post    node_id_last_post_date    4    const    552480    Using where      
SIMPLE    user    eq_ref    PRIMARY    PRIMARY    4    sitename.thread.user_id    1
 
Large page offsets will always be potentially problematic. MySQL is likely fetching all of the results and then throwing away the first few hundred thousand unfortunately.

This is one of the reasons why you can limit forums to only display threads from a recent period of time.
 
You can still get to the older threads if needed by going to the last page and clicking the link provided; the goal would really just be to make it hit a fair bit less and to reduce the expense of the count queries as well. Keep in mind that Google can still discover all of the old threads because of the sitemap.

An approach like that may help but it's a rather significant change.
 
Thanks Mike, I really want to try and keep the functionality as-is, without limiting anything as I'm sure there will be an impact on search traffic (as although the sitemap helps google find it, it doesn't allocate value to the page in the way a link would).

I'll see if there's anything I can do to break that query up, as I really want to get this working. I appreciate this sort of optimisation is out of the remit for XF product support, but if you have any suggestions for what I can look in to then please do let me know, as I'm not great with MySQL queries. It's my last stumbling block for this particular site, but I'm sure it can be improved somehow. :confused:

If anyone else is reading this and has any suggestions that I could try in terms of splitting the query then please do let me know. I noticed that if I do a simple query like (SELECT thread_id FROM xf_thread as thread ORDER BY thread_id LIMIT 20 OFFSET 238340) then it takes around the same amount of time for the first run, but it is significantly faster thereafter - even speeding up other queries with different offsets as part of this must be cached. If I can somehow split the query to sort the rows using this smaller query, I'm hoping that this may mean that I can join in the rest of the results. Does that make sense, as I'm a real novice with MySQL.
 
I'm sure there will be an impact on search traffic
I don't believe so. Even if date limiting the node's thread list within ACP, as per Mike's suggestion, the final page of the forum list still has a very visible 'Click here to display older threads' link. You can see it in action at https://netrider.net.au/forum/general-motorcycling-discussion.56/ and I've had no SEO or Google search adverse results as a result, and had mine date limited to 1 year for almost 2 years.
 
I noticed that if I do a simple query like (SELECT thread_id FROM xf_thread as thread ORDER BY thread_id LIMIT 20 OFFSET 238340) then it takes around the same amount of time for the first run, but it is significantly faster thereafter - even speeding up other queries with different offsets as part of this must be cached.
If it applies to other limits, that may imply that a lot of the data set isn't actually in memory. Has your server been tuned for InnoDB? (http://www.percona.com/blog/2013/09/20/innodb-performance-optimization-basics-updated/) Notably the buffer pool size.
 
If it applies to other limits, that may imply that a lot of the data set isn't actually in memory. Has your server been tuned for InnoDB? (http://www.percona.com/blog/2013/09/20/innodb-performance-optimization-basics-updated/) Notably the buffer pool size.

I'd consider it tuned as it runs other large sites very well (and it was tuned by someone who knows more than I do), however there is a risk that I'm not seeing the true query speed because the site isn't public so the cache hasn't filled yet. Even so, it'll still be slower than it could otherwise be.

Here's a link to one of the IGN pages that exhibits a similar problem (and I'm guessing they've got a beefy DB server that's well optimised):

http://www.ign.com/boards/forums/the-vestibule.5296/page-68235

I don't believe so. Even if date limiting the node's thread list within ACP, as per Mike's suggestion, the final page of the forum list still has a very visible 'Click here to display older threads' link. You can see it in action at https://netrider.net.au/forum/general-motorcycling-discussion.56/ and I've had no SEO or Google search adverse results as a result, and had mine date limited to 1 year for almost 2 years.

Thanks @Mouth you could well be right, but I'm still very wary about this as it makes it much harder for search engines to see links to older content, and I strongly suspect it would be devauled as such. I can't afford to take the risk in this instance, so I'm being very cautious.
 
it makes it much harder for search engines to see links to older content
Disagree. To a search bot, there is no difference between seeing a page jump/counter link and seeing the link on the last page (of a date limited forum). It will arrive at the same content in the same way. Sitemap submission is the most important anyway, and the search bot just needs to be able to access the sitemap listed page to index the content it finds.

A slow page display speed will impact your SEO much more than where your link to old content is displayed. And, IMHO, user experience with page load speed is much more important than being very cautious about when a search bot sees a link to old content. But, understand it's your site and your decision. I wouldn't be sidelining an XF migration though because of it.
 
I've just checked another large site and @digitalpoint 's forum takes a long while to load older forum pages too, in this case it was almost 75 seconds:

https://forums.digitalpoint.com/forums/domains.59/page-3548

DP.webp

A slow page display speed will impact your SEO much more than where your link to old content is displayed. And user experience with page load speed is much more important that being very cautious about than when a search bot sees a link to old content. But, understand it's your site and your decision. I wouldn't be sidelining an XF migration though because of it.

Yeah it's the speed/SEO combo that bothers me most of all - so even if I do stop older links from displaying then it'll still result in the same problem. I don't think it'll stop the migration, but I'll need to figure something out first (at worse, perhaps splitting the forum sections by year to an archive).
 
It's more an issue with how MySQL needs to process all records before the limit clause (meaning it has to process all records before the ones you want thousands of pages deep).

I actually started working on a different way I getting threads (not using MySQL LIMIT). It's how I do it here: https://marketplace.digitalpoint.com/sites page 10000 would be just as fast as page 1... I started to make an addon to apply the same system to threads in forums, but I got busy doing other more important stuff and never finished it.
 
I forgot I use the same paging principle for the main portal page as well... https://www.digitalpoint.com/

I'll see about revisiting the thread paging thing sometime and roll it into my internal "Scalability" addon when I have some time.

Honestly though it hasn't been a big deal... Not like anyone pages that deep anyway without searching for what they are looking for, so...
 
so even if I do stop older links from displaying then it'll still result in the same problem
Still disagree. The search bot is primarily following your sitemap submission with threads, not page links with forum listings. The bot just see's it as a duplicate page already seen via sitemap submission and doesn't follow.
Not like anyone pages that deep anyway without searching for what they are looking for, so...
Yes.
 
Mhm. This is quite problematic. Surely we are not going to limit how many threads are displayed.
@Xon have you found a way to fix this?

Have you tried the addon on your site, as it does make a lot of improvements that help things significantly - I could be wrong and wonder if the "better thread list page threshold" option tackles this problematic query.
 
I use most of @Xon 's addons. These are an asset that any medium-big board needs. I hope he bundles them all up as one big performance addon. It should be core really.
Its too early to say as I recently migrated my big board and am trying to identify all performance issues and address them one by one.
 
Top Bottom