• This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn more.

Need help optimizing a query...

Jaxel

Well-known member
#1
So in my code, I am using the following query:
Code:
SELECT * FROM
(
    SELECT EWRmedio_keywords.*, COUNT(EWRmedio_keylinks.keylink_id) AS count
    FROM EWRmedio_keywords
        LEFT JOIN EWRmedio_keylinks ON (EWRmedio_keylinks.keyword_id = EWRmedio_keywords.keyword_id)
    WHERE EWRmedio_keywords.keyword_id IN (
        SELECT keyword_id FROM EWRmedio_keylinks WHERE media_id IN (6878, 6877, 6876, 6875)
    )
    GROUP BY EWRmedio_keywords.keyword_id
    ORDER BY count DESC
    LIMIT 10
) t
ORDER BY keyword_text ASC
Its pretty simple; looking at the code from the inside->out, I can explain what it does...
  1. Code:
    SELECT keyword_id FROM EWRmedio_keylinks WHERE media_id IN (6878, 6877, 6876, 6875)
    Gets a list of keyword_ids for any keyword linked to a specific set of media_ids.
  2. Code:
    SELECT EWRmedio_keywords.*, COUNT(EWRmedio_keylinks.keylink_id) AS count
    FROM EWRmedio_keywords
        LEFT JOIN EWRmedio_keylinks ON (EWRmedio_keylinks.keyword_id = EWRmedio_keywords.keyword_id)
    WHERE EWRmedio_keywords.keyword_id IN ( --- step 1 --- )
    GROUP BY EWRmedio_keywords.keyword_id
    ORDER BY count DESC
    LIMIT 10
    Fetches extended information about those keywords linked to those keyword_ids found in step 1; it also counts the number of times in total those keywords have ever been linked to all media (not just those 4 specific media selected in step 1). Only gets the top 10 results sorted by counts.
  3. Code:
    SELECT * FROM
    ( --- step 2 --- ) t
    ORDER BY keyword_text ASC
    Takes all the results from step 2, and sorts them alphabetically.
This is some pretty complicated code... but surprisingly, it runs pretty well and only takes about 0.1 seconds to execute. This code is what lets me get the new *booru-style menu in my media library.

So what needs to be optimized? Well I recently added the ability to not only tag keywords, but to tag users to media... and naturally, I want to be able to add a *booru-style menu for users underneath the menu for keywords. So I've changed the code to do the same queries, but for users, instead of keywords:
Code:
SELECT *
FROM
(
    SELECT xf_user.*, COUNT(EWRmedio_userlinks.userlink_id) AS count
    FROM xf_user
        LEFT JOIN EWRmedio_userlinks ON (EWRmedio_userlinks.user_id = xf_user.user_id)
    WHERE xf_user.user_id IN (
        SELECT user_id FROM EWRmedio_userlinks WHERE media_id IN (6878, 6877, 6876, 6875)
    )
    GROUP BY xf_user.user_id
    ORDER BY count DESC
    LIMIT 10
) t
ORDER BY username ASC
But this code... wow, it runs SOOOOO much slower than the keyword version. Taking sometimes as long as 6 seconds to process. I assume its taking so long because of the sheer size of the xf_user table? What can I do to optimize this code?
 

Jaxel

Well-known member
#2
Okay... I did some more testing...

If I run step 1 by itself with the following, it runs super fast:
Code:
SELECT username_id
FROM EWRmedio_userlinks
WHERE media_id IN (6878, 6877, 6876, 6875, 6874, 6873, 6872, 6871, 6870, 6869, 6868, 6867, 6866, 6865, 6864, 6863, 6862, 6861, 6860, 6859, 6858, 6857, 6856, 6855, 6854, 6853, 6852, 6851, 6850, 6849)
GROUP BY username_id
Results:
Code:
28, 205, 306, 327, 564, 779, 1248, 1369, 1566, 1989, 2056, 2162, 3907, 4072, 4702, 5374, 5526, 9725, 13182, 14300

If I then run step 2 by itself with the results from step 1, it also runs super fast...
Code:
SELECT xf_user.*, COUNT(EWRmedio_userlinks.userlink_id) AS count
FROM xf_user
    LEFT JOIN EWRmedio_userlinks ON (EWRmedio_userlinks.username_id = xf_user.user_id)
WHERE xf_user.user_id IN (28, 205, 306, 327, 564, 779, 1248, 1369, 1566, 1989, 2056, 2162, 3907, 4072, 4702, 5374, 5526, 9725, 13182, 14300)
GROUP BY xf_user.user_id
ORDER BY count DESC
LIMIT 20
However, if I run both step 1 and step 2 at the same time, it runs super slow. Why does combining two very fast queries equate to a slow query?
 

Jaxel

Well-known member
#3
Okay, I separated the code into two queries and now its running in 0.15 seconds instead of 6 seconds.
Code:
if (!$userIDs = $this->_getDb()->fetchAll("
    SELECT username_id
        FROM EWRmedio_userlinks
    WHERE media_id IN (" . $this->_getDb()->quote($mediaIDs) . ")
    GROUP BY username_id
"))
{
    return array();
}

if (!$users = $this->_getDb()->fetchAll("
    SELECT *
    FROM
    (
        SELECT xf_user.*, COUNT(EWRmedio_userlinks.userlink_id) AS count
        FROM xf_user
            LEFT JOIN EWRmedio_userlinks ON (EWRmedio_userlinks.username_id = xf_user.user_id)
        WHERE xf_user.user_id IN (" . $this->_getDb()->quote($userIDs) . ")
        GROUP BY xf_user.user_id
        ORDER BY count DESC
        LIMIT ?
    ) t
    ORDER BY username ASC
", XenForo_Application::get('options')->EWRmedio_displaybooru))
{
    return array();
}
I would still like to know WHY the code was running so slow if anyone can help me with that.
 
#5
Part of your problem might be indexing. You should probably consider adding a key for the user_id column in the EWRmedio_userlinks table. Another problem can be from using a SELECT inside another SELECT. I believe your Indexing can end up going out the window by combining multiple selects in a single query.

You might considering running an explain query on your original query to see what the results are. It should list how many rows were scanned.