1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

UNION ALL vs...

Discussion in 'General PHP and MySQL Discussions' started by Lawrence, Apr 18, 2016.

  1. Lawrence

    Lawrence Well-Known Member

    For an add-on I am currently working on I am using UNION ALL to retrieve user data for forum home... <snip>

    Edit: never mind, took a new approach, :)
     
    Last edited: Apr 18, 2016
  2. Xon

    Xon Well-Known Member

    This is good, because UNION ALL makes pagination of any sort utterly evil :p
     
    Lawrence likes this.
  3. Mouth

    Mouth Well-Known Member

    How so?
    Considering that a temporary table is built as the result of any union, if where constraints are appropriate at each query of the union then using limit on the output of the temporary table is simple.
     
    Lawrence likes this.
  4. Xon

    Xon Well-Known Member

    It is the amount of data that needs to be inserted into the temporary table to implement pagination when using unions.

    Consider wanting to paginate via limit/offset the following:
    Code:
    select *
    from a
    where a.foo != 'bar'
    UNION ALL
    select *
    from b
    where a.bar != 'foo'
    

    What you need todo is:

    Code:
    select *
    from (
    select *
    from a
    where a.foo != 'bar'
    UNION ALL
    select *
    from b
    where a.bar != 'foo'
    ) c
    limit 50
    offset 10000
    
    It gets worse when you add-joins and accidently make the select clause contain too many things. MySQL implements early-row binding, this means the first 10000 rows are returned from the inner query to the outer query, and then the last 50 are returns to the client.

    Throw in, that most versions of MySQL are horrible at optimizing this too boot.

    I modified a XenForo add-on to move a join inside each of the union and to the outer select, and this went from a page taking 5-10 seconds to less than 1 second when on the 800'th page of results.
     
  5. Lawrence

    Lawrence Well-Known Member

    @Xon @Mouth thanks for starting a discussion about this. I decided to let admins choose the type of query to execute if they decide to not use total viewers per level 1 category/forum/page node on forum home.

    Retrieving the totals only, is just a simple query using GROUP BY.

    If admins decide to display x number of user names/avatars under each level 1 node on forum home (no pagination is required as the more link is set for an individual node id and will work like the followers overlay):

    Option One:
    UNION ALL does work well and guarantees there will be members to display per level 1 forum if there are any. It's just noticeably slower to execute, and can become quite large if there are a number of level 1 nodes on forum home. I use INNER JOIN for the user data, and that appears to be a little faster than LEFT JOIN.

    Option Two:
    Retrieve all user_ids, and node_ids they are viewing, and then loop through them to sort by node_id, and check the user counts, array_slice the number of users to the limit set by the admin (+ 1, for the more links) for each node_id, and then use another query to load in the user data. This way needs an additional query, but the total user records retrieved in the second query will be limited to number of level_1 nodes * the limit setting + 1. This option places the work load on PHP.

    The way I stand on either choice is that of six of one, half dozen of the other, though the latter choice would likely be more efficient overall.
     
    Xon likes this.
  6. Xon

    Xon Well-Known Member

    Push as many joins outside of the UNION if you can.

    Instead of:
    Code:
    select *
    from a
    join d on d.id = a.id
    where a.foo != 'bar'
    UNION ALL
    select *
    from b
    join d on d.id = b.id
    where b.bar != 'foo'
    
    Try:
    Code:
    select *
    from
    (select a.id
    from a
    where a.foo != 'bar'
    UNION ALL
    select b.id
    from b
    where b.bar != 'foo'
    ) c
    join d on d.id = c.id
    
    This can be much faster as MySQL does early-binding, this means the contents of your select clause pulls in a lot of data that you aren't expecting.

    Yes, changing your SELECT clause can massively impact performance! :(
     
    Lawrence likes this.
  7. Mouth

    Mouth Well-Known Member

    Not totally sure I understand what your objective/output is. But it does sounds like a few joins would be more optimised than a union. That said, I can't imagine you'd be pulling back more than a few thousand records in your union result set, so a union should be quite fast too.
     
    Lawrence and Xon like this.
  8. Lawrence

    Lawrence Well-Known Member

    I'm using UNION ALL on the same table. The number of UNION ALL's generated is dependant on the number of level 1 nodes on forum home. I had to INNER JOIN the user table in each UNION ALL for the WHERE clause can check users visibility/activity view (if XF version > 1.4) settings. This worked well, and returned the visible members per level one node, up to the LIMIT set.

    Although the above works, it was slow. So I took Xon's advise and moved the INNER JOIN for the user table out of the UNION ALL's and placed it at the end. This improved the speed by about 25%, or maybe more, and shrunk the query down by almost half in bytes. But the visibility check doesn't get done until after the users viewing each node was fetched. As a result, if category one had 7 members viewing, and one was invisible, it would pull the limit (6 for my testing), and then filter out the invisible member and only returned 5 members even though 6 are visible.

    To get around this, I added a new column to my table that sets the users visibly status, so now the user table only needs to be joined using ON user_id = users_viewing.user_id. With this new column I rewrote the other queries used for single nodes/threads, and so they are faster too as their is no longer a need for a WHERE clause applied to the user table.
     
    Xon likes this.
  9. Xon

    Xon Well-Known Member

    Instead of programmatically generating a UNION ALL and repeating the same query a lot, can you rewrite the WHERE clause on the node table to use IN rather than = ?
     
  10. Lawrence

    Lawrence Well-Known Member

    That was the first query I wrote (IN or =, depending if it is the last node or not), and it works as expected when viewing a first level category, or deeper.

    I replaced the UNION ALL for forum home. Although it works great, to respect the view permission for each child node, it just got too complex for each level 1 node displayed on forum home. I replaced it with a method that generates two queries: one to get the totals that meet the criteria, and returns them sorted by level 1 node ids, and the second to retrieve the same data, but with the user records included. As the first query handled the view permissions and member privacy settings, the second query did not need to do this. So, the correct number of user records is retrieved for each level 1 node.

    For forum home, I added a second type of query option admins can select. The second option uses just one query, but has to retrieve more user records, and perform the view permission check afterwards to get the correct number of members viewing each level one node.

    So to get members viewing each level 1 node to be displayed on forum home, it's 2 queries vs. one query that uses more memory; a choice I thought best for forum owners to decide based on their server specs.
     
    Mouth and Xon like this.

Share This Page