1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Need help optimizing a query...

Discussion in 'General PHP and MySQL Discussions' started by Jaxel, Dec 13, 2013.

  1. Jaxel

    Jaxel Well-Known Member

    So in my code, I am using the following query:
    Code:
    SELECT * FROM
    (
        SELECT EWRmedio_keywords.*, COUNT(EWRmedio_keylinks.keylink_id) AS count
        FROM EWRmedio_keywords
            LEFT JOIN EWRmedio_keylinks ON (EWRmedio_keylinks.keyword_id = EWRmedio_keywords.keyword_id)
        WHERE EWRmedio_keywords.keyword_id IN (
            SELECT keyword_id FROM EWRmedio_keylinks WHERE media_id IN (6878, 6877, 6876, 6875)
        )
        GROUP BY EWRmedio_keywords.keyword_id
        ORDER BY count DESC
        LIMIT 10
    ) t
    ORDER BY keyword_text ASC
    Its pretty simple; looking at the code from the inside->out, I can explain what it does...
    1. Code:
      SELECT keyword_id FROM EWRmedio_keylinks WHERE media_id IN (6878, 6877, 6876, 6875)
      Gets a list of keyword_ids for any keyword linked to a specific set of media_ids.
    2. Code:
      SELECT EWRmedio_keywords.*, COUNT(EWRmedio_keylinks.keylink_id) AS count
      FROM EWRmedio_keywords
          LEFT JOIN EWRmedio_keylinks ON (EWRmedio_keylinks.keyword_id = EWRmedio_keywords.keyword_id)
      WHERE EWRmedio_keywords.keyword_id IN ( --- step 1 --- )
      GROUP BY EWRmedio_keywords.keyword_id
      ORDER BY count DESC
      LIMIT 10
      
      Fetches extended information about those keywords linked to those keyword_ids found in step 1; it also counts the number of times in total those keywords have ever been linked to all media (not just those 4 specific media selected in step 1). Only gets the top 10 results sorted by counts.
    3. Code:
      SELECT * FROM
      ( --- step 2 --- ) t
      ORDER BY keyword_text ASC
      Takes all the results from step 2, and sorts them alphabetically.
    This is some pretty complicated code... but surprisingly, it runs pretty well and only takes about 0.1 seconds to execute. This code is what lets me get the new *booru-style menu in my media library.

    So what needs to be optimized? Well I recently added the ability to not only tag keywords, but to tag users to media... and naturally, I want to be able to add a *booru-style menu for users underneath the menu for keywords. So I've changed the code to do the same queries, but for users, instead of keywords:
    Code:
    SELECT *
    FROM
    (
        SELECT xf_user.*, COUNT(EWRmedio_userlinks.userlink_id) AS count
        FROM xf_user
            LEFT JOIN EWRmedio_userlinks ON (EWRmedio_userlinks.user_id = xf_user.user_id)
        WHERE xf_user.user_id IN (
            SELECT user_id FROM EWRmedio_userlinks WHERE media_id IN (6878, 6877, 6876, 6875)
        )
        GROUP BY xf_user.user_id
        ORDER BY count DESC
        LIMIT 10
    ) t
    ORDER BY username ASC
    But this code... wow, it runs SOOOOO much slower than the keyword version. Taking sometimes as long as 6 seconds to process. I assume its taking so long because of the sheer size of the xf_user table? What can I do to optimize this code?
     
  2. Jaxel

    Jaxel Well-Known Member

    Okay... I did some more testing...

    If I run step 1 by itself with the following, it runs super fast:
    Code:
    SELECT username_id
    FROM EWRmedio_userlinks
    WHERE media_id IN (6878, 6877, 6876, 6875, 6874, 6873, 6872, 6871, 6870, 6869, 6868, 6867, 6866, 6865, 6864, 6863, 6862, 6861, 6860, 6859, 6858, 6857, 6856, 6855, 6854, 6853, 6852, 6851, 6850, 6849)
    GROUP BY username_id
    Results:
    Code:
    28, 205, 306, 327, 564, 779, 1248, 1369, 1566, 1989, 2056, 2162, 3907, 4072, 4702, 5374, 5526, 9725, 13182, 14300

    If I then run step 2 by itself with the results from step 1, it also runs super fast...
    Code:
    SELECT xf_user.*, COUNT(EWRmedio_userlinks.userlink_id) AS count
    FROM xf_user
        LEFT JOIN EWRmedio_userlinks ON (EWRmedio_userlinks.username_id = xf_user.user_id)
    WHERE xf_user.user_id IN (28, 205, 306, 327, 564, 779, 1248, 1369, 1566, 1989, 2056, 2162, 3907, 4072, 4702, 5374, 5526, 9725, 13182, 14300)
    GROUP BY xf_user.user_id
    ORDER BY count DESC
    LIMIT 20
    
    However, if I run both step 1 and step 2 at the same time, it runs super slow. Why does combining two very fast queries equate to a slow query?
     
  3. Jaxel

    Jaxel Well-Known Member

    Okay, I separated the code into two queries and now its running in 0.15 seconds instead of 6 seconds.
    Code:
    if (!$userIDs = $this->_getDb()->fetchAll("
        SELECT username_id
            FROM EWRmedio_userlinks
        WHERE media_id IN (" . $this->_getDb()->quote($mediaIDs) . ")
        GROUP BY username_id
    "))
    {
        return array();
    }
    
    if (!$users = $this->_getDb()->fetchAll("
        SELECT *
        FROM
        (
            SELECT xf_user.*, COUNT(EWRmedio_userlinks.userlink_id) AS count
            FROM xf_user
                LEFT JOIN EWRmedio_userlinks ON (EWRmedio_userlinks.username_id = xf_user.user_id)
            WHERE xf_user.user_id IN (" . $this->_getDb()->quote($userIDs) . ")
            GROUP BY xf_user.user_id
            ORDER BY count DESC
            LIMIT ?
        ) t
        ORDER BY username ASC
    ", XenForo_Application::get('options')->EWRmedio_displaybooru))
    {
        return array();
    }
    I would still like to know WHY the code was running so slow if anyone can help me with that.
     
  4. ceribik

    ceribik Active Member

  5. King Arceus

    King Arceus Member

    Part of your problem might be indexing. You should probably consider adding a key for the user_id column in the EWRmedio_userlinks table. Another problem can be from using a SELECT inside another SELECT. I believe your Indexing can end up going out the window by combining multiple selects in a single query.

    You might considering running an explain query on your original query to see what the results are. It should list how many rows were scanned.
     

Share This Page