Fixed Add use index to merge posts query

Azaly

Active member
Affected version
2.2.3
In large thread merging posts query took long time in some environment:
Code:
analyze
    -> SELECT post_id, post_date, user_id, username, reaction_score, reactions
    -> FROM xf_post
    -> WHERE thread_id = 14369
    -> ORDER BY post_date
    -> LIMIT 1;

+------+-------------+---------+-------+-------------------------------------------------------------+-----------+---------+-------+---------+------------+----------+------------+-------------+
| id   | select_type | table   | type  | possible_keys                                               | key       | key_len | ref   | rows    | r_rows     | filtered | r_filtered | Extra       |
+------+-------------+---------+-------+-------------------------------------------------------------+-----------+---------+-------+---------+------------+----------+------------+-------------+
|    1 | SIMPLE      | xf_post | index | thread_id_post_date,thread_id_position,thread_id_score_date | post_date | 4       | const | 3880759 | 4018744.00 |     0.52 |       0.00 | Using where |
+------+-------------+---------+-------+-------------------------------------------------------------+-----------+---------+-------+---------+------------+----------+------------+-------------+
1 row in set (21.908 sec)

@Xon suggested to add the use index hint for this query. In this case query time is ok:
Code:
analyze
    -> SELECT post_id, post_date, user_id, username, reaction_score, reactions
    -> FROM xf_post use index (thread_id_post_date)
    -> WHERE thread_id = '14369'
    -> ORDER BY post_date
    -> LIMIT 1;
+------+-------------+---------+------+---------------------+---------------------+---------+-------+-------+--------+----------+------------+-------------+
| id   | select_type | table   | type | possible_keys       | key                 | key_len | ref   | rows  | r_rows | filtered | r_filtered | Extra       |
+------+-------------+---------+------+---------------------+---------------------+---------+-------+-------+--------+----------+------------+-------------+
|    1 | SIMPLE      | xf_post | ref  | thread_id_post_date | thread_id_post_date | 4       | const | 20172 | 1.00   |   100.00 |     100.00 | Using where |
+------+-------------+---------+------+---------------------+---------------------+---------+-------+-------+--------+----------+------------+-------------+
1 row in set (0.001 sec)
More detail:
 
I don't think there's necessarily harm in adding an index hint to this query, but equally, it is a super simple query and if MySQL/MariaDB's optimizer isn't picking out the correct index, that seems like a concerning failure.

It's very possible that this relates to your forum's quite unexpected post:thread ratio. Looking at your site's stats, there are an average of around 800 posts per thread and there are a bunch of threads with over 20,000 replies. For comparison, our average thread length here is 8 (randomly checking a few other large XF sites, they average around 13). This might cause issues with the optimizer statistics for example.

As evidenced by one of your other threads, there's an example of a newer version of MariaDB using alternative optimizer approaches and leading to bad query plans suddenly. That certainly could be at play here too. This is definitely one of the difficulties with MariaDB as it drifts further from MySQL's own behavior as it's not quite as much of a drop in replacement.
 
It's very possible that this relates to your forum's quite unexpected post:thread ratio.
We many years used large thread (max to 20 000 published post) with mariadb 5.5 with no problem in post merging. Since November 2020 we've upgraded to mariadb 10.5.6 but faced with this problem only in February 2021. At this time I've performed only minor forum upgrade (to 2.2.3 from 2.2.2) and tested custom threads fields (I do not used them before). So… not sure what happened and why we have this problem only now. With adding use index in src/XF/Entity/Thread.php performance a bit better 😖.

MariaDB using alternative optimizer approaches and leading to bad query plans suddenly. That certainly could be at play here too.
I didn't know that! Now I beware to upgrade mariadb 😨
 
i had a big issue with slow queries using the API to get threads. was taking 1-5 seconds sometimes. seemed to be using wrong index's and stuff.

this was happening on MariaDB 10.4.8.

Have upgraded to 10.4.13 and problem has disappeared.
 
Just for the record, we ran into the same issue after upgrading from Percona Server 5.7(.27) to Percona Server 8.0(.25). Moving posts from a thread with ~ 8000 posts caused the mentioned query to take more than 40 seconds which resulted in the "Oops" message when the client-side JavaScript ran into a timeout after 35 seconds.

Excerpt from the MySQL slow query log:
Code:
# Query_time: 48.018114  Lock_time: 0.000013  Rows_sent: 1  Rows_examined: 22751007  Rows_affected: 0  Bytes_sent: 478
SET timestamp=1630916324;
SELECT post_id, post_date, user_id, username, reaction_score, reactions
            FROM xf_post
            WHERE thread_id = '1894677'
            ORDER BY post_date
            LIMIT 1;

Optimizing or analyzing the xf_post table didn't help. Adding USE INDEX (thread_id_post_date) to the query did help.

(Something I don't understand: When I run the unmodified query in a MySQL shell then it runs in no time: 0.00 sec. But when XenForo runs it then it takes forever. Could XenForo somehow trigger this bug by setting MySQL session variables or something?)
 
@Steffen Can you run explain format=json on the query (or analyze format=json if supported by that version)? This will dump a lot more details than a standard explain which could help troubleshoot why this is happening.
 
On SpaceBattles, I grabbed the longest thread with;
SQL:
select thread_id,reply_count from xf_thread order by reply_count desc limit 1;
It has 83852 replies.

Running;
SQL:
analyze
SELECT post_id, post_date, user_id, username, reaction_score, reactions
FROM xf_post
WHERE thread_id = '370782'
ORDER BY post_date
LIMIT 1;
Gives the output;
Code:
+------+-------------+---------+-------+-----------------------------------------------------------------------+-----------+---------+-------+----------+-------------+----------+------------+-------------+
| id   | select_type | table   | type  | possible_keys                                                         | key       | key_len | ref   | rows     | r_rows      | filtered | r_filtered | Extra       |
+------+-------------+---------+-------+-----------------------------------------------------------------------+-----------+---------+-------+----------+-------------+----------+------------+-------------+
|    1 | SIMPLE      | xf_post | index | thread_id_post_date,thread_id_score_date,thread_id_position_post_date | post_date | 4       | const | 45721965 | 20106107.00 |     0.35 |       0.00 | Using where |
+------+-------------+---------+-------+-----------------------------------------------------------------------+-----------+---------+-------+----------+-------------+----------+------------+-------------+
1 row in set (2 min 51.652 sec)
(json format output)
JSON:
{
  "query_block": {
    "select_id": 1,
    "r_loops": 1,
    "r_total_time_ms": 172191,
    "table": {
      "table_name": "xf_post",
      "access_type": "index",
      "possible_keys": [
        "thread_id_post_date",
        "thread_id_score_date",
        "thread_id_position_post_date"
      ],
      "key": "post_date",
      "key_length": "4",
      "used_key_parts": ["post_date"],
      "ref": ["const"],
      "r_loops": 1,
      "rows": 45721999,
      "r_rows": 2.01e7,
      "r_total_time_ms": 170007,
      "filtered": 0.3495,
      "r_filtered": 5e-6,
      "attached_condition": "xf_post.thread_id <=> '370782'"
    }
  }
}
And yes that is nearly 3 minutes to run the query and return the query plan.

This degenerates into what is essentially a table-scan checking some 20 million rows.

With the index hint;
Code:
+------+-------------+---------+------+---------------------+---------------------+---------+-------+--------+--------+----------+------------+-------------+
| id   | select_type | table   | type | possible_keys       | key                 | key_len | ref   | rows   | r_rows | filtered | r_filtered | Extra       |
+------+-------------+---------+------+---------------------+---------------------+---------+-------+--------+--------+----------+------------+-------------+
|    1 | SIMPLE      | xf_post | ref  | thread_id_post_date | thread_id_post_date | 4       | const | 177448 |   1.00 |   100.00 |     100.00 | Using where |
+------+-------------+---------+------+---------------------+---------------------+---------+-------+--------+--------+----------+------------+-------------+
1 row in set (0.000 sec)
JSON:
{
  "query_block": {
    "select_id": 1,
    "r_loops": 1,
    "r_total_time_ms": 0.0492,
    "table": {
      "table_name": "xf_post",
      "access_type": "ref",
      "possible_keys": ["thread_id_post_date"],
      "key": "thread_id_post_date",
      "key_length": "4",
      "used_key_parts": ["thread_id"],
      "ref": ["const"],
      "r_loops": 1,
      "rows": 177448,
      "r_rows": 1,
      "r_total_time_ms": 0.0279,
      "filtered": 100,
      "r_filtered": 100,
      "attached_condition": "xf_post.thread_id <=> '370782'"
    }
  }
}

~0.05ms vs 2.8 minutes.

Likely need to patch both rebuildLastPostInfo and rebuildLastPostInfo.

:Edit:
So the crazy thing is this is hitting the longest thread on SpaceBattles (with ~41 million posts in xf_post) for a thread with 83852 replies, but not Sufficient Velocity (with ~16 million posts in xf_post) for a thread with 446620 replies.

MySQL is just reporting stupidly undercounted row estimates for one site, and not the other.
 
Last edited:
I can reproduce this as well (MariaDB 10.4.20, CLI) one one database, though with faster responses compred to @Xon as our DB is not nearly as large and the largest thread only has about 18K replies.

I couldn't reproduce it on anther DB (about 7 million posts, longest thread has 50K posts).
 
Does the index hint have a negative performance on that other DB? I shouldn’t think so but we’ve seen similar things in the past.
 
For me, on the site which works, it is already picking the correct index so adding the use index doesn't change anything.
 
For me, on the site which works, it is already picking the correct index so adding the use index doesn't change anything.
Same here.

Affected Site
Code:
ANALYZE FORMAT=JSON  SELECT post_id, post_date, user_id, username, reaction_score, reactions FROM xf_post WHERE thread_id = 299058 ORDER BY post_date LIMIT 1;
 {
  "query_block": {
    "select_id": 1,
    "r_loops": 1,
    "r_total_time_ms": 3793.6,
    "table": {
      "table_name": "xf_post",
      "access_type": "index",
      "possible_keys": [
        "thread_id_post_date",
        "thread_id_position",
        "thread_id_score_date"
      ],
      "key": "post_date",
      "key_length": "4",
      "used_key_parts": ["post_date"],
      "ref": ["const"],
      "r_loops": 1,
      "rows": 4411158,
      "r_rows": 1.14e6,
      "r_total_time_ms": 3543.1,
      "filtered": 0.8014,
      "r_filtered": 8.8e-5,
      "attached_condition": "xf_post.thread_id <=> 299058"
    }
  }
}

ANALYZE FORMAT=JSON  SELECT post_id, post_date, user_id, username, reaction_score, reactions FROM xf_post USE INDEX (thread_id_post_date) WHERE thread_id = 299058 ORDER BY post_date LIMIT 1;
 {
  "query_block": {
    "select_id": 1,
    "r_loops": 1,
    "r_total_time_ms": 0.1372,
    "table": {
      "table_name": "xf_post",
      "access_type": "ref",
      "possible_keys": ["thread_id_post_date"],
      "key": "thread_id_post_date",
      "key_length": "4",
      "used_key_parts": ["thread_id"],
      "ref": ["const"],
      "r_loops": 1,
      "rows": 36084,
      "r_rows": 1,
      "r_total_time_ms": 0.1034,
      "filtered": 100,
      "r_filtered": 100,
      "attached_condition": "xf_post.thread_id <=> 299058"
    }
  }
}

Not affected Site
Code:
ANALYZE FORMAT=JSON SELECT post_id, post_date, user_id, username, reaction_score, reactions FROM xf_post WHERE thread_id = 2652039 ORDER BY post_date LIMIT 1;
{
  "query_block": {
    "select_id": 1,
    "r_loops": 1,
    "r_total_time_ms": 0.057,
    "table": {
      "table_name": "xf_post",
      "access_type": "ref",
      "possible_keys": ["thread_id_post_date", "thread_id_position"],
      "key": "thread_id_post_date",
      "key_length": "4",
      "used_key_parts": ["thread_id"],
      "ref": ["const"],
      "r_loops": 1,
      "rows": 82120,
      "r_rows": 1,
      "r_total_time_ms": 0.0362,
      "filtered": 100,
      "r_filtered": 100,
      "attached_condition": "xf_post.thread_id <=> 2652039"
    }
  }
}

ANALYZE FORMAT=JSON SELECT post_id, post_date, user_id, username, reaction_score, reactions FROM xf_post USE INDEX (thread_id_post_date) WHERE thread_id = 2652039 ORDER BY post_date LIMIT 1;
{
  "query_block": {
    "select_id": 1,
    "r_loops": 1,
    "r_total_time_ms": 0.0645,
    "table": {
      "table_name": "xf_post",
      "access_type": "ref",
      "possible_keys": ["thread_id_post_date"],
      "key": "thread_id_post_date",
      "key_length": "4",
      "used_key_parts": ["thread_id"],
      "ref": ["const"],
      "r_loops": 1,
      "rows": 82120,
      "r_rows": 1,
      "r_total_time_ms": 0.0395,
      "filtered": 100,
      "r_filtered": 100,
      "attached_condition": "xf_post.thread_id <=> 2652039"
    }
  }
}
 
@Steffen Can you run explain format=json on the query (or analyze format=json if supported by that version)?

EXPLAIN FORMAT=JSON:
JSON:
{
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "2981.89"
    },
    "ordering_operation": {
      "using_filesort": false,
      "table": {
        "table_name": "xf_post",
        "access_type": "ref",
        "possible_keys": [
          "thread_id_post_date",
          "thread_id_position",
          "thread_id_score_date"
        ],
        "key": "thread_id_post_date",
        "used_key_parts": [
          "thread_id"
        ],
        "key_length": "4",
        "ref": [
          "const"
        ],
        "rows_examined_per_scan": 8238,
        "rows_produced_per_join": 8238,
        "filtered": "100.00",
        "cost_info": {
          "read_cost": "2158.09",
          "eval_cost": "823.80",
          "prefix_cost": "2981.89",
          "data_read_per_join": "16M"
        },
        "used_columns": [
          "post_id",
          "thread_id",
          "user_id",
          "username",
          "post_date",
          "reaction_score",
          "reactions"
        ]
      }
    }
  }
}

EXPLAIN ANALYZE:
Code:
-> Limit: 1 row(s)  (cost=2981.89 rows=1) (actual time=0.020..0.020 rows=1 loops=1)
    -> Index lookup on xf_post using thread_id_post_date (thread_id=1894677)  (cost=2981.89 rows=8238) (actual time=0.019..0.019 rows=1 loops=1)

("ERROR 1235 (42000): This version of MySQL doesn't yet support 'FORMAT=JSON with EXPLAIN ANALYZE'")





As I said above, the hint-less query is lightning-fast for me when I execute it in a MySQL shell. I therefore wanted to find out whether the problem is caused by some MySQL connection setting used by XenForo and created the following script to execute the query in the context of XenForo:

PHP:
<?php

namespace XF\Cli\Command;

use Symfony\Component\Console\Command\Command;
use Symfony\Component\Console\Input\InputInterface;
use Symfony\Component\Console\Output\OutputInterface;

class MysqlTest extends Command
{
    use JobRunnerTrait;

    protected function configure()
    {
        $this->setName('xf:mysql-test');
    }

    protected function execute(InputInterface $input, OutputInterface $output)
    {
        $app = \XF::app();

        var_dump($app->db()->fetchAll("SELECT post_id, post_date, user_id, username, reaction_score, reactions FROM xf_post WHERE thread_id = '1894677' ORDER BY post_date LIMIT 1"));
    }
}

(Save the script as "src/XF/Cli/Command/MysqlTest.php" and then run "php cmd.php xf:mysql-test".)

It is lightning-fast this way, too.

I don't know why it is slow when executed while actually moving posts. Maybe it's affected by other queries in the same transaction.
 
We've run into the same issue with the Ip::getLoggedIp method. Adding "USE INDEX (content_type_content_id)" fixes queries that would cause a timeout after 30+ seconds.
 
We've run into the same issue with the Ip::getLoggedIp method. Adding "USE INDEX (content_type_content_id)" fixes queries that would cause a timeout after 30+ seconds.
This function is flawed in a significant way.

Ip::getLoggedIp is only called with the content type 'user'. Except all non-content ip records are store under user/user id/<action> triplet, with only an index on content type/content id columns. This logs every user login IP, and such including the registration/confirmation IPs.

There was a bug in older version which logged a login ip on practically every request for users.

The real fix is to record the xf_ip.ip_id field for the registration record somewhere on a related user table. This turns scanning an arbitrary number of rows using string matching to just a primary key lookup.
 
Last edited:
Top Bottom