Fixed Wrong use of quote in getUsersByIds

Bugfix

Member
Hello,

analyzing slow queries we found a weird use of "_getDb()->quote" in:
XenForo_Model_User -> getUsersByIds
This code is used:
Code:
WHERE user.user_id IN (' . $this->_getDb()->quote($userIds) . ')
There are 2 Problems:
1) Assuming this array:
Code:
Array
(
  [0] => 139226
  [1] => 114548
  [2] => 123568
  [3] => 126662
)
you get this SQL:
Code:
WHERE user.user_id IN ('139226', 114548, 123568, 126662)
So, only the first userid is quoted which doesn't make sense.

2) Not using index when quoting an integer:
Because of the quote, the primary index "xf_user.user_id" isn't used:

With 1st id quoted:
Code:
mysql> EXPLAIN SELECT xf_user.user_id FROM xf_user WHERE xf_user.user_id IN ('139226', 114548, 123568, 126662);
+----+-------------+---------+-------+---------------+------------+---------+------+--------+--------------------------+
| id | select_type | table   | type  | possible_keys | key        | key_len | ref  | rows   | Extra                    |
+----+-------------+---------+-------+---------------+------------+---------+------+--------+--------------------------+
|  1 | SIMPLE      | xf_user | index | PRIMARY       | user_state | 1       | NULL | 356315 | Using where; Using index |
+----+-------------+---------+-------+---------------+------------+---------+------+--------+--------------------------+
1 row in set (0.00 sec)

Without quoting:
Code:
mysql> EXPLAIN SELECT xf_user.user_id FROM xf_user WHERE xf_user.user_id IN (139226, 114548, 123568, 126662);
+----+-------------+---------+-------+---------------+---------+---------+------+------+--------------------------+
| id | select_type | table   | type  | possible_keys | key     | key_len | ref  | rows | Extra                    |
+----+-------------+---------+-------+---------------+---------+---------+------+------+--------------------------+
|  1 | SIMPLE      | xf_user | range | PRIMARY       | PRIMARY | 4       | NULL |    4 | Using where; Using index |
+----+-------------+---------+-------+---------------+---------+---------+------+------+--------------------------+
1 row in set (0.00 sec)


Performance difference:
  • With 1st id quoted:
Code:
mysql> SELECT SQL_NO_CACHE xf_user.user_id FROM xf_user WHERE xf_user.user_id IN ('139226', 114548, 123568, 126662);
+---------+
| user_id |
+---------+
|  114548 |
|  123568 |
|  126662 |
|  139226 |
+---------+
4 rows in set (0.36 sec)
  • Without quoting:
Code:
mysql> SELECT SQL_NO_CACHE xf_user.user_id FROM xf_user WHERE xf_user.user_id IN (139226, 114548, 123568, 126662);
+---------+
| user_id |
+---------+
|  114548 |
|  123568 |
|  126662 |
|  139226 |
+---------+
4 rows in set (0.00 sec)


Notable performance differences if many user id’s used (e.g. ~1.000 user id’s):

With 1st quoted:

958 rows in set (4.88 sec)

Without quotes:

958 rows in set (0.01 sec)

All statements performed with optimizer hint ‘SQL_NO_CACHE’ in order to disable mysql statement cache.

My workaround now was using this code:
Code:
WHERE user.user_id IN (' . implode(",", $userIds) . ')
Would be great to get feedback since I absolutely want to avoid changing XF-code.

regards,
Harald
 
What's the stack trace to that point where they're passed in as mixed types? Technically we could run intval on it for example, but I want to identify if this is come from custom code (which would be adjusted to pass in consistent types).
 
I tried to reproduce the exact example mentioned in the OP and I couldn't.

Well, the index thing was a different story, (for me, at least), but the quoted user IDs output was correct.
 
Sometimes this function gets arrays like this:
Code:
Array
(
  [0] => 139226
  [1] => 114548
  [2] => 123568
  [3] => 126662
)

sometimes like this:
Code:
Array
(
  [139226] => 139226
  [114548] => 114548
  [123568] => 123568
  [126662] => 126662
)
 
That page doesn't use that function. I'm still interested in the stack trace that caused this (as it has passed in mixed types and I wouldn't expect that to be common).
 
But in XenForo_ControllerPublic_Member there is a call in line 1501:
PHP:
$users = $userModel->getUsersByIds($userIds, array(
                'join' => XenForo_Model_User::FETCH_USER_PRIVACY
            ));
That particular function is nothing specifically to do with the page mentioned.

It generates the session activity details for when viewing pages in that controller, e.g. "Chris D was last seen: Viewing member member profile X"

That said, there's still nothing there that would involve mixed types. The $userIds passed into the get users query are specifically cast to int:
PHP:
$userIds = array();
foreach ($activities AS $activity)
{
   if (!empty($activity['params']['user_id']))
   {
      $userIds[$activity['params']['user_id']] = intval($activity['params']['user_id']);
   }
}
The array keys aren't, but they aren't used anyway.

This function would be potentially extended by add-ons which may have an impact here.

Stack traces would usually be associated with server error logs, if no errors have been logged relating to this, then specifically the information we need to find out is what code called getUsersByIds() with an array of user IDs of mixed types. Obviously at some point you identified this happening with that function, so we need a bit more context.
 
The $userIds passed into the get users query are specifically cast to int:
This is true - i can see it. Problem is the generated SQL - not the incoming array.

This function would be potentially extended by add-ons which may have an impact here.
I will search for this.

if no errors have been logged relating to this
That is true.

Obviously at some point you identified this happening with that function, so we need a bit more context.
That was the slow query log of the mysql-server.
 
This is true - i can see it. Problem is the generated SQL - not the incoming array.
I have to disagree.

The function that generates the SQL is in Zend_Db_Adapter_Abstract::quote().

Consider that we're inputting an array so this function is recursive, if it's an array it runs each value through the same function, and then ultimately implodes the modified array with a comma.

We don't pass a type in so it will hit the::_quote() method, which is this:

PHP:
if (is_int($value)) {
    return $value;
} elseif (is_float($value)) {
    return sprintf('%F', $value);
}
return "'" . addcslashes($value, "\000\n\r\\'\"\032") . "'";

As you can see here, if the value passed in (or one of the array elements we pass in) is an int already, it isn't quoted. If it isn't an int, then that's when it is quoted and slashes etc. are added accordingly.

What this means is, the input array going into the quote method is something like:

PHP:
array(
    '139226' => '139226',
    '114548' => 114548,
    '123568' => 123568
);
...and so on

Notice the first array value is a string, the rest are ints. When quoted that would produce this:
Code:
'139226', 114548, 123568
So, ultimately, the issue does lie in the incoming array and the function is producing the correct SQL based on that input.

Hence why we need to try and trace where those mixed types are coming from.
 
Just to be clear, if everything is passed in as a string, the index is used as expected. If everything is passed in as an int, the index is used as expected. It's specifically down to mixed types.

It's important to understand where this is coming from.
 
I have to disagree.
What I ment is this:
Even if the SQL is created with a mixed userid-string, the incoming user-id-array is CORRECT!

I added this code to the function:
PHP:
$sql = 'SELECT user.*
                    ' . $joinOptions['selectFields'] . '
                FROM xf_user AS user
                ' . $joinOptions['joinTables'] . '
                WHERE user.user_id IN (' . $this->_getDb()->quote($userIds) . ')
                ' . $orderClause . '';
       
        $uids = print_r($userIds, true);
        $fp = fopen('/home/myproject/web/logs/logfile.txt', 'a');
        fwrite($fp, $uids."\n");
        fwrite($fp, $sql."\n");
        fwrite($fp, "---------------------\n");
        fclose($fp);

Then I clicked this link:
https://www.mydomain.at/members/
and switched the tabs.
After about 20sec I disabled the logging again.

The txt - file is as follows:

Code:
Array
(
    [288973] => 288973
)

SELECT user.*
                    ,
                    user_privacy.*
                FROM xf_user AS user
               
                    LEFT JOIN xf_user_privacy AS user_privacy ON
                        (user_privacy.user_id = user.user_id)
                WHERE user.user_id IN (288973)
                ORDER BY user.username
---------------------
Array
(
    [0] => 260201
    [1] => 261004
    [2] => 247814
    [3] => 151934
    [4] => 70965
    [5] => 274332
    [6] => 259197
    [7] => 158299
    [8] => 161593
    [9] => 366857
    [10] => 77923
    [11] => 311940
    [12] => 80895
    [13] => 184793
    [14] => 238598
    [15] => 405656
    [16] => 335763
    [17] => 375491
    [18] => 405113
    [19] => 405112
    [20] => 239395
    [21] => 397597
    [22] => 365023
    [23] => 207069
    [24] => 236561
    [25] => 398083
    [26] => 208544
    [27] => 292646
    [28] => 246125
    [29] => 122117
    [30] => 197316
    [31] => 79927
    [32] => 398867
    [33] => 171116
    [34] => 394599
    [35] => 52948
    [36] => 257868
    [37] => 373592
    [38] => 293721
    [39] => 368210
    [40] => 325768
    [41] => 385434
    [42] => 390988
    [43] => 324850
    [44] => 200810
    [45] => 220237
    [46] => 324522
    [47] => 205384
    [48] => 333963
    [49] => 92234
    [50] => 77328
    [51] => 248009
    [52] => 259309
    [53] => 294490
    [54] => 312322
    [55] => 22393
    [56] => 380300
    [57] => 364625
    [58] => 365212
    [59] => 304352
    [60] => 398419
    [61] => 301488
    [62] => 103110
    [63] => 310911
    [64] => 319458
    [65] => 293508
    [66] => 337856
    [67] => 338
    [68] => 93276
    [69] => 217957
    [70] => 382960
    [71] => 270625
    [72] => 321628
    [73] => 35070
    [74] => 377090
    [75] => 214615
    [76] => 405664
    [77] => 340680
    [78] => 273693
    [79] => 248738
    [80] => 383982
    [81] => 268845
    [82] => 208148
    [83] => 238302
    [84] => 391277
    [85] => 31793
    [86] => 321811
    [87] => 215725
    [88] => 232783
    [89] => 11972
    [90] => 317645
    [91] => 163216
    [92] => 396936
    [93] => 287397
    [94] => 399084
    [95] => 235346
    [96] => 264849
    [97] => 232244
    [98] => 5732
    [99] => 293331
    [100] => 332954
    [101] => 354275
    [102] => 174058
    [103] => 339132
    [104] => 286235
    [105] => 77558
    [106] => 312454
    [107] => 247512
    [108] => 199654
    [109] => 363313
    [110] => 209242
    [111] => 257177
    [112] => 338173
    [113] => 32127
    [114] => 3560
    [115] => 152317
    [116] => 293794
    [117] => 211377
    [118] => 75961
    [119] => 321275
    [120] => 19324
    [121] => 160479
    [122] => 89684
    [123] => 43582
    [124] => 371176
    [125] => 95315
    [126] => 24857
    [127] => 323751
    [128] => 273127
    [129] => 24985
    [130] => 265249
    [131] => 221362
    [132] => 389634
    [133] => 134005
    [134] => 361128
    [135] => 223595
    [136] => 182344
    [137] => 300471
    [138] => 244842
    [139] => 182272
    [140] => 113274
    [141] => 273363
    [142] => 78774
    [143] => 216933
    [144] => 252096
    [145] => 330652
    [146] => 292267
    [147] => 99279
    [148] => 252098
    [149] => 221475
    [150] => 347135
    [151] => 42193
    [152] => 31784
    [153] => 21990
    [154] => 164359
)

SELECT user.*
                   
                FROM xf_user AS user
               
                WHERE user.user_id IN (260201, 261004, 247814, 151934, 70965, 274332, 259197, 158299, 161593, 366857, 77923, 311940, 80895, 184793, 238598, 405656, 335763, 375491, 405113, 405112, 239395, 397597, 365023, 207069, 236561, 398083, 208544, 292646, 246125, 122117, 197316, 79927, 398867, 171116, 394599, 52948, 257868, 373592, 293721, 368210, 325768, 385434, 390988, 324850, 200810, 220237, 324522, 205384, 333963, 92234, 77328, 248009, 259309, 294490, 312322, 22393, 380300, 364625, 365212, 304352, 398419, 301488, 103110, 310911, 319458, 293508, 337856, 338, 93276, 217957, 382960, 270625, 321628, 35070, 377090, 214615, 405664, 340680, 273693, 248738, 383982, 268845, 208148, 238302, 391277, 31793, 321811, 215725, 232783, 11972, 317645, 163216, 396936, 287397, 399084, 235346, 264849, 232244, 5732, 293331, 332954, 354275, 174058, 339132, 286235, 77558, 312454, 247512, 199654, 363313, 209242, 257177, 338173, 32127, 3560, 152317, 293794, 211377, 75961, 321275, 19324, 160479, 89684, 43582, 371176, 95315, 24857, 323751, 273127, 24985, 265249, 221362, 389634, 134005, 361128, 223595, 182344, 300471, 244842, 182272, 113274, 273363, 78774, 216933, 252096, 330652, 292267, 99279, 252098, 221475, 347135, 42193, 31784, 21990, 164359)
                ORDER BY user.username
---------------------
Array
(
    [0] => 367394
)

SELECT user.*
                    ,
                    user_privacy.*,
                    IF(user_follow.user_id IS NOT NULL, 1, 0) AS following_170312
                FROM xf_user AS user
               
                    LEFT JOIN xf_user_privacy AS user_privacy ON
                        (user_privacy.user_id = user.user_id)
                    LEFT JOIN xf_user_follow AS user_follow ON
                        (user_follow.user_id = user.user_id AND user_follow.follow_user_id = 170312)
                WHERE user.user_id IN (367394)
                ORDER BY user.username
---------------------
Array
(
    [0] => 301488
    [1] => 16670
    [2] => 24985
    [3] => 31784
    [4] => 52948
    [5] => 68281
    [6] => 81571
    [7] => 105080
    [8] => 117571
    [9] => 145129
    [10] => 155646
    [11] => 158257
    [12] => 160662
    [13] => 164316
    [14] => 164359
    [15] => 165063
    [16] => 167994
    [17] => 169523
    [18] => 185778
    [19] => 193797
    [20] => 196448
    [21] => 198878
    [22] => 200563
    [23] => 206211
    [24] => 207905
    [25] => 218217
    [26] => 220786
    [27] => 225789
    [28] => 225951
    [29] => 226220
    [30] => 231845
)

SELECT user.*
                   
                FROM xf_user AS user
               
                WHERE user.user_id IN ('301488', 16670, 24985, 31784, 52948, 68281, 81571, 105080, 117571, 145129, 155646, 158257, 160662, 164316, 164359, 165063, 167994, 169523, 185778, 193797, 196448, 198878, 200563, 206211, 207905, 218217, 220786, 225789, 225951, 226220, 231845)
                ORDER BY user.username
---------------------
Array
(
    [166582] => 166582
)

SELECT user.*
                    ,
                    user_privacy.*
                FROM xf_user AS user
               
                    LEFT JOIN xf_user_privacy AS user_privacy ON
                        (user_privacy.user_id = user.user_id)
                WHERE user.user_id IN (166582)
                ORDER BY user.username
---------------------

Have a look at the 2nd last log:
Code:
Array
(
    [0] => 301488 <--- no qoutes!
    [1] => 16670
    [2] => 24985
and the sql:
Code:
WHERE user.user_id IN ('301488', 16670, 24985

Problem:
I have over 20 installed addons...
I cannot foreclose that the sql is called by an addon.
For example i have installed "Live Update" which makes an ajax call every 3 seconds.
Maybe it is from this addon.

Do you have an idea how I can find out which file is calling this function and makes the problem?

Best regards
 
It will potentially generate a lot of noise, but in your logging file if you add a line that prints the following, that should give more info:
PHP:
json_encode(debug_backtrace());
 
I found a little "bug" in my understanding: print_r does not show the datatype:
PHP:
$string = (string)"12345";
    $integer = (int)67890;
   
    $arr[0] = $string;
    $arr[1] = $integer;
    echo '<pre>';
    print_r($arr);
    echo '</pre>';
    var_dump($string);
    var_dump($integer);

Code:
Array
(
    [0] => 12345
    [1] => 67890
)

string(5) "12345"
int(67890)

so, as you say, it is possible that the first array-item is a string.
 
It will potentially generate a lot of noise, but in your logging file if you add a line that prints the following, that should give more info:
here is the output from the "problem-call":
Code:
[{"file":"\/home\/myproject\/web\/library\/XenGallery\/Model\/Album.php","line":437,"function":"getUsersByIds","class":"XenForo_Model_User","object":{},"type":"->","args":[["301488",16670,24985,31784,52948,68281,81571,105080,117571,145129,155646,158257,160662,164316,164359,165063,167994,169523,185778,193797,196448,198878,200563,206211,207905,218217,220786,225789,225951,226220,231845,232061,234816,236847,237622,244842,257177,260232,265165,267670,271211,290110,293552,293857,306716,311067,311161,314736,324451,331282,332504,334125,339132,340781,341279,342097,344566,349466,353388,360738,364465,366582,369913,373673,386018,386792,394983,395014,398083]]}
 
I've made some changes in XFMG where this might happen, including the specific example.

There were a few calls to getUsersByIds() where the array of user IDs may have been built from a string, such as a serialized value of shared album users, as an example. We try and cast these to ints now where applicable.

Thanks :)
 
Top Bottom