Best practices for MySQL indexing?

Jaxel

Well-known member
Indexing has always been something a bit hard to figure out. Indexing trades a performance hit during database writes, in order to improve performance during database reads. It has always been my understanding, that any columns which you would use in a WHERE clause should be indexed. But I often find myself wondering about excessive uses of indexes...

Prime example would be my XenAtendo addon. The events table of this addon is as follows:
Code:
            $table->addColumn('user_id',                 'int', 10);
            $table->addColumn('username',                'varchar', 100);
            $table->addColumn('thread_id',                'int', 10);
            $table->addColumn('event_id',                'int', 10)->autoIncrement();
            $table->addColumn('event_date',                'int', 10);
            $table->addColumn('event_repeat',            'varchar', 50);
            $table->addColumn('event_state',            'enum')->values(['visible', 'moderated']);
            $table->addColumn('event_title',            'varchar', 255);
            $table->addColumn('event_description',        'mediumtext');
            $table->addColumn('event_location',            'varchar', 255);
            $table->addColumn('event_address',            'varchar', 255);
            $table->addColumn('event_latitude',            'double');
            $table->addColumn('event_longitude',        'double');
            $table->addColumn('event_geocode',            'blob');
            $table->addColumn('event_rsvp',                'int', 1);
            $table->addColumn('event_featured',            'int', 1);

Naturally, I would want to define event_id as the primary key... one index down. Then for collision sake, I would want to define thread_id as a unique key, because a thread can only be linked to one event... two indexes down.

But then I think about all the possible search queries I would want to include in this system:
  • user_id - search for events from a single user
  • event_date - search for events created after certain dates
  • event_state - search for events states for moderation purposes
  • event_latitude - search for events based on their location
  • event_longitude - search for events based on their location
  • event_featured - search for events which are marked as featured
Now we have 8 indexes... that's half the table. Other modifications I have, you can filter rows out based on every column... which would have you indexing the entire table. At what point do you just say "f**k it" and index nothing, because you're tried to dealing with it?
 
As far as I know, the only way you can determine what you need to index (outside of running automated tools that log all MySQL queries and analyses them, if such a tool even exists), would be for you to generate a bunch of data automatically, then perform things like searches and normal browsing yourself, using debug mode.

You would then go to XF's debug tool (accessible by clicking the run time link in the footer when debug mode is enabled) and manually read the output of the queries there. What you are looking for in particular is high query execution times (obviously), but also the words "using filesort". If you can somehow avoid using filesort either by indexing or by redesigning your DB or whatever you have to do, then that will save you a lot of headache once large forums start using your mod.

Alternatively, if you want to talk to someone who actually knows what they are talking about, I would recommend chatting to @eva2000 as his pinky finger knows more about database design than I do :P


Fillip
 
There is write overheard to maintaining an index, but generally, unless the table has a high volume of writes an extra index will not be an issue. As Dave mentioned generating an EXPLAIN plan is the best method to determine if an index is needed. MySQL maintains statistics on each table. It uses these statistics to determine the best path to access the data. This path can change over time. For example, if you have a table with 10 rows MySQL may choose to do a full table scan because it's cheaper to scan all 10 rows than it is to scan an index to find the correct row. Over time that table grows to 1000 rows and MySQL changes its execution plan to use an index because the cost is lower than scanning every row.

Also, MySQL will try to use the best index so even if you index a column in the where clause that index may never be used. For example, if you are using the primary key in the where clause then that index will probably be used and no other indexes will be used. MySQL knows there will only be 1 row for a primary key column so using that index will be the fastest path to the data. Run this and you will find every index that has not been used since the last time you restarted MySQL.

SQL:
SELECT object_schema,
   object_name,
   index_name
FROM performance_schema.table_io_waits_summary_by_index_usage
WHERE index_name IS NOT NULL
AND count_star = 0
ORDER BY object_schema, object_name;

Here are your top 10 most used indexes

SQL:
SELECT object_schema, object_name, index_name, count_star  
FROM performance_schema.table_io_waits_summary_by_index_usage   
WHERE index_name IS NOT NULL AND count_star > 0 ORDER BY count_star desc limit 10;

Here you can see the read and write activity per table. You probably have far more reads vs writes.

SQL:
SELECT object_type, object_schema, object_name
     , count_star, count_read, count_write, count_fetch
     , count_insert, count_update, count_delete
  FROM performance_schema.table_io_waits_summary_by_table
WHERE count_star > 0
ORDER BY count_star DESC;

There are a bunch of options for tracking down bad queries if you think you have them. You can log queries that do not use an index, but like I said some queries are faster not using an index. You can also set the minimum number of rows to log so you avoid logging queries on small tables.
 
Alternatively, if you want to talk to someone who actually knows what they are talking about, I would recommend chatting to @eva2000 as his pinky finger knows more about database design than I do
no expert but know enough
You might be interested in using the EXPLAIN function to test your changes to index on your database. It's a little convoluted, but once you understand the paths your searches are taking, it can be pretty useful.

https://dev.mysql.com/doc/refman/5.7/en/using-explain.html

https://www.sitepoint.com/using-explain-to-write-better-mysql-queries/
+1 for EXPLAIN .. when @DragonByte Tech mentioned automated tools, you can write scripts to more easily collate and gather all the required data, including mysql slow queries and explain etc. Though still need human analysis for make sense of data collated.

Run this and you will find every index that has not been used since the last time you restarted MySQL.
Queries would only be available provided performance schema is actually enabled. I usually disable it by default as performance schema itself has performance overhead too.
 
Last edited:
Queries would only be available provided performance schema is actually enabled. I usually disable it by default as performance schema itself has performance overhead too.

True, but the overhead is very small. The benefits of that data to make informed decisions far outweighs the overhead it takes to maintain that data IMO. The performance schema is enabled by default as of MySQL 5.6.6 I believe.

If anyone wants to check this query should do it.

SQL:
SHOW VARIABLES LIKE 'performance_schema';
 
Back
Top Bottom