1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

XF 1.4 Slow queries on very large forums

Discussion in 'Troubleshooting and Problems' started by nrep, Feb 3, 2015.

  1. nrep

    nrep Well-Known Member

    I'm testing out upgrading a forum with several million threads, which contains many forum categories with hundreds of thousands of posts.

    I did a test XF import and it works really well (and fast) on smaller forum sections, however if I start browsing a deep forum section page (i.e. page 10,000 of a forum), then things crawl to a halt. The further back I go, the worse it gets. I can use up to page 2,000 without much of a delay, but if I start to access page 10,000+ I get timeouts (over 30 seconds).

    Here's an example of one of the slow queries I get from debug mode:

    Code:
    SELECT thread.*
        ,
        user.*, IF(user.username IS NULL, thread.username, user.username) AS username,
        deletion_log.delete_date, deletion_log.delete_reason,
        deletion_log.delete_user_id, deletion_log.delete_username,
            IF(thread_read.thread_read_date > 1420400047, thread_read.thread_read_date, 1420400047) AS thread_read_date,
        IF(thread_watch.user_id IS NULL, 0,
            IF(thread_watch.email_subscribe, 'watch_email', 'watch_no_email')) AS thread_is_watched,
        thread_user_post.post_count AS user_post_count
    FROM xf_thread AS thread
    
        LEFT JOIN xf_user AS user ON
            (user.user_id = thread.user_id)
        LEFT JOIN xf_deletion_log AS deletion_log ON
            (deletion_log.content_type = 'thread' AND deletion_log.content_id = thread.thread_id)
        LEFT JOIN xf_thread_read AS thread_read ON
            (thread_read.thread_id = thread.thread_id
            AND thread_read.user_id = 1)
        LEFT JOIN xf_thread_watch AS thread_watch
            ON (thread_watch.thread_id = thread.thread_id
            AND thread_watch.user_id = 1)
        LEFT JOIN xf_thread_user_post AS thread_user_post
            ON (thread_user_post.thread_id = thread.thread_id
            AND thread_user_post.user_id = 1)
    WHERE (thread.node_id = 152) AND (thread.sticky = 0) AND (thread.discussion_state IN ('visible','deleted','moderated'))
    ORDER BY thread.last_post_date DESC
    LIMIT 20 OFFSET 162380
    Run Time: 12.230421
    
    Select Type    Table    Type    Possible Keys    Key    Key Len    Ref    Rows    Extra      
    SIMPLE    thread    ref    node_id_last_post_date,node_id_sticky_state_last_post    node_id_last_post_date    4    const    552480    Using where      
    SIMPLE    user    eq_ref    PRIMARY    PRIMARY    4    testsite.thread.user_id    1          
    SIMPLE    deletion_log    eq_ref    PRIMARY    PRIMARY    31    const,testsite.thread.thread_id    1    Using where      
    SIMPLE    thread_read    eq_ref    user_id_thread_id,thread_id    user_id_thread_id    8    const,testsite.thread.thread_id    1          
    SIMPLE    thread_watch    eq_ref    PRIMARY,thread_id_email_subscribe    PRIMARY    8    const,testsite.thread.thread_id    1          
    SIMPLE    thread_user_post    eq_ref    PRIMARY,user_id    PRIMARY    8    testsite.thread.thread_id,const    1        
    
    
    This forum runs really well on vBulletin, so I'm very puzzled as to why it crawls on XF? I would consider MySQL to be well optimised - other sites on the server fly, as does the VB version of the site.
     
  2. nrep

    nrep Well-Known Member

    If I browse the forum as a guest, things are faster but this query slows things down. This is probably worse in a way, as spiders will be the ones crawling these pages the most:

    Code:
    SELECT thread.*
        ,
        user.*, IF(user.username IS NULL, thread.username, user.username) AS username,
        NULL AS thread_read_date,
        0 AS thread_is_watched,
        0 AS user_post_count
    FROM xf_thread AS thread
    
        LEFT JOIN xf_user AS user ON
            (user.user_id = thread.user_id)
    WHERE (thread.node_id = 152) AND (thread.sticky = 0) AND (thread.discussion_state IN ('visible'))
    ORDER BY thread.last_post_date DESC
    LIMIT 20 OFFSET 238340
    Run Time: 4.383607
    Select Type    Table    Type    Possible Keys    Key    Key Len    Ref    Rows    Extra      
    SIMPLE    thread    ref    node_id_last_post_date,node_id_sticky_state_last_post    node_id_last_post_date    4    const    552480    Using where      
    SIMPLE    user    eq_ref    PRIMARY    PRIMARY    4    sitename.thread.user_id    1        
    
    
     
  3. Mike

    Mike XenForo Developer Staff Member

    Large page offsets will always be potentially problematic. MySQL is likely fetching all of the results and then throwing away the first few hundred thousand unfortunately.

    This is one of the reasons why you can limit forums to only display threads from a recent period of time.
     
    otto likes this.
  4. nrep

    nrep Well-Known Member

  5. Mike

    Mike XenForo Developer Staff Member

    You can still get to the older threads if needed by going to the last page and clicking the link provided; the goal would really just be to make it hit a fair bit less and to reduce the expense of the count queries as well. Keep in mind that Google can still discover all of the old threads because of the sitemap.

    An approach like that may help but it's a rather significant change.
     
    otto likes this.
  6. nrep

    nrep Well-Known Member

    Thanks Mike, I really want to try and keep the functionality as-is, without limiting anything as I'm sure there will be an impact on search traffic (as although the sitemap helps google find it, it doesn't allocate value to the page in the way a link would).

    I'll see if there's anything I can do to break that query up, as I really want to get this working. I appreciate this sort of optimisation is out of the remit for XF product support, but if you have any suggestions for what I can look in to then please do let me know, as I'm not great with MySQL queries. It's my last stumbling block for this particular site, but I'm sure it can be improved somehow. :confused:

    If anyone else is reading this and has any suggestions that I could try in terms of splitting the query then please do let me know. I noticed that if I do a simple query like (SELECT thread_id FROM xf_thread as thread ORDER BY thread_id LIMIT 20 OFFSET 238340) then it takes around the same amount of time for the first run, but it is significantly faster thereafter - even speeding up other queries with different offsets as part of this must be cached. If I can somehow split the query to sort the rows using this smaller query, I'm hoping that this may mean that I can join in the rest of the results. Does that make sense, as I'm a real novice with MySQL.
     
  7. Mouth

    Mouth Well-Known Member

    I don't believe so. Even if date limiting the node's thread list within ACP, as per Mike's suggestion, the final page of the forum list still has a very visible 'Click here to display older threads' link. You can see it in action at https://netrider.net.au/forum/general-motorcycling-discussion.56/ and I've had no SEO or Google search adverse results as a result, and had mine date limited to 1 year for almost 2 years.
     
  8. Mike

    Mike XenForo Developer Staff Member

    If it applies to other limits, that may imply that a lot of the data set isn't actually in memory. Has your server been tuned for InnoDB? (http://www.percona.com/blog/2013/09/20/innodb-performance-optimization-basics-updated/) Notably the buffer pool size.
     
  9. nrep

    nrep Well-Known Member

    I'd consider it tuned as it runs other large sites very well (and it was tuned by someone who knows more than I do), however there is a risk that I'm not seeing the true query speed because the site isn't public so the cache hasn't filled yet. Even so, it'll still be slower than it could otherwise be.

    Here's a link to one of the IGN pages that exhibits a similar problem (and I'm guessing they've got a beefy DB server that's well optimised):

    http://www.ign.com/boards/forums/the-vestibule.5296/page-68235

    Thanks @Mouth you could well be right, but I'm still very wary about this as it makes it much harder for search engines to see links to older content, and I strongly suspect it would be devauled as such. I can't afford to take the risk in this instance, so I'm being very cautious.
     
  10. Mouth

    Mouth Well-Known Member

    Disagree. To a search bot, there is no difference between seeing a page jump/counter link and seeing the link on the last page (of a date limited forum). It will arrive at the same content in the same way. Sitemap submission is the most important anyway, and the search bot just needs to be able to access the sitemap listed page to index the content it finds.

    A slow page display speed will impact your SEO much more than where your link to old content is displayed. And, IMHO, user experience with page load speed is much more important than being very cautious about when a search bot sees a link to old content. But, understand it's your site and your decision. I wouldn't be sidelining an XF migration though because of it.
     
    eva2000 likes this.
  11. nrep

    nrep Well-Known Member

    I've just checked another large site and @digitalpoint 's forum takes a long while to load older forum pages too, in this case it was almost 75 seconds:

    https://forums.digitalpoint.com/forums/domains.59/page-3548

    DP.png

    Yeah it's the speed/SEO combo that bothers me most of all - so even if I do stop older links from displaying then it'll still result in the same problem. I don't think it'll stop the migration, but I'll need to figure something out first (at worse, perhaps splitting the forum sections by year to an archive).
     
  12. digitalpoint

    digitalpoint Well-Known Member

    It's more an issue with how MySQL needs to process all records before the limit clause (meaning it has to process all records before the ones you want thousands of pages deep).

    I actually started working on a different way I getting threads (not using MySQL LIMIT). It's how I do it here: https://marketplace.digitalpoint.com/sites page 10000 would be just as fast as page 1... I started to make an addon to apply the same system to threads in forums, but I got busy doing other more important stuff and never finished it.
     
    nrep likes this.
  13. nrep

    nrep Well-Known Member

    Cheers @digitalpoint - if you do ever consider making that addon then please do let me know :D. That market place is DAMN fast!
     
  14. digitalpoint

    digitalpoint Well-Known Member

    I forgot I use the same paging principle for the main portal page as well... https://www.digitalpoint.com/

    I'll see about revisiting the thread paging thing sometime and roll it into my internal "Scalability" addon when I have some time.

    Honestly though it hasn't been a big deal... Not like anyone pages that deep anyway without searching for what they are looking for, so...
     
    nrep likes this.
  15. Mouth

    Mouth Well-Known Member

    Still disagree. The search bot is primarily following your sitemap submission with threads, not page links with forum listings. The bot just see's it as a duplicate page already seen via sitemap submission and doesn't follow.
    Yes.
     

Share This Page