On MySQL....
I was recently asked to give a talk about MySQL and query optimization for a PHP Users group. Flattered and excited, I realized I packed wayyyy too much into the the talk than I can do in 15 minutes. So I'm going to expand it here in more prose.
As has been the case for this whole blog in its short history, this is just my experience and knowledge - it definitely isn't the final word, and I'm nowhere near an expert.
First, a note:
I use a custom pager in MySQL, which makes results a lot easier to see. It is this:
pager less -niSFX
I don't know where I found it, as it was a while ago, but its fantastic.
MySQL's Explain
She is a terrible mistress, the Explain. Hard to get formatted to look right, not very clear, and a little archaic looking. That said, with a little help, it is actually really great.
I started using EXPLAIN a lot when I got this position, because I spent a lot of time tracking down performance sinks and bugs in queries that were 2-5 years old. It took me a while to really get a hold of it, but now I run explain on any query before I run it. First, it'll give you a rough idea of how long something will take because it estimates rows. Second, it is somewhat like linting your queries.
Explain has a bunch of sections to it, which I'll run through.
Select Type
This kind of shows you how MySQL is treating each part of your query. You will usually see SIMPLE, PRIMARY, and DEPENDENT SUBQUERY. If you happen to see DERIVED or UNCACHEABLE SUBQUERY, you'll probably have some performance issues, and there are usually ways to restructure your query to avoid them. UNION RESULT is fine, it just happens to show you what comes out.
Table
This is the alias you gave the table - a good reason to use clear, readable aliases instead of just t1, t2, etc.
Type
What kind of joining is happening. These are listed in order from best to worst performance.
- system and const - really fast, because its basically one value.
- eq_ref - each row in table 1 has an exact match in table 2 on some kind of unique key
- ref - two indexes are compared with equals or not equals
- index_merge - a fake composite index made out of two other indexes
- range - using the index to find a range of values
- all - full table scan on every match
There are some that I've excluded because I've never seen them.
possible_keys
Lists the keys that could be used. If you see NULL, you should probably make an index on something.
key
Which key the was actually used to process the join. If you find that it is not the one you want, you can use FORCE INDEX. If you don't see anything at all, you may either have to force an index, or the index you used isn't good enough for the join.
ref
Which columns are being compared against the index in the previous column.
Extra
This is pretty much where all the magic is from Explain. The other columns give you some information, but this is where a lot of the performance sinks will really show up.
Here are some things that are bad:
- using temporary - MySQL is creating a temporary table for you, whether it be in memory or on disk. Either way, this is slower than not using temporary, since it still has to make a table, put the data in it, and (possible) index it.
- Using filesort - MySQL is going to look at your entire result set again to complete the query. This can, and usually will, double the length of time the query takes.
- Full scan on NULL key - something bad happened in a subquery. I dont really know much more than that.
- Impossible WHERE/HAVING - something in your where or having clause is deemed impossible, such as WHERE 2 = 1. I dont know why MySQL doesn't actually just error out on these, but keep it in mind. These will actually just process forever.
Not Exists - MySQL converted a LEFT JOIN into a NOT EXISTS. I've only seen this a few times.
And some good:
- using index
- using where
- using index for group by
Putting it all together
Say you have a query, that is kind of gross, and you want to fix it. I'm using MySQL's Sakila database for my examples.
SELECT
f.title,
f.release_year
FROM
film AS f
JOIN film_actor AS fa
ON fa.film_id = f.film_id
JOIN actor AS a
ON fa.actor_id = a.actor_id
WHERE
a.last_name IN ('GUINESS', 'GABLE', 'KILMER', 'GARLAND', 'NOLTE', 'CAGE')
order by 1 limit 10;
This pops out with an explain that includes:
+----+-------------+-------+--------+-----------------------------+---------------------+---------+-------------------+------+-----------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+-----------------------------+---------------------+---------+-------------------+------+-----------------------------------------------------------+
| 1 | SIMPLE | a | range | PRIMARY,idx_actor_last_name | idx_actor_last_name | 137 | NULL | 18 | Using where; Using index; Using temporary; Using filesort |
| 1 | SIMPLE | fa | ref | PRIMARY,idx_fk_film_id | PRIMARY | 2 | sakila.a.actor_id | 1 | Using index |
| 1 | SIMPLE | f | eq_ref | PRIMARY | PRIMARY | 2 | sakila.fa.film_id | 1 | |
+----+-------------+-------+--------+-----------------------------+---------------------+---------+-------------------+------+-----------------------------------------------------------+
Looking through this, you can see that its definitely using a bunch of keys, but its pulling temporary and filesort on the end result. Those are going to make the result a lot slower - unfortunately, the data set I'm using is so small that the results pop out instantly, and really dont show the performance sink.
The output of the Explain can show you how to improve things, but it really just takes some playing with to get good at.
Some Performance Notes
There are a few things you really want to avoid in MySQL when you are going to join tables. Here's is a quick list of things that I recommend you avoid:
- Functions. If you have a datetime column that may be indexed, and want to find things that happened on Valentines day, don't use date(datecolumn) = '2015-02-14'. This will kill the ability to use an index. It's even worse when you are joining, something like LEFT(column, 4) = othercolumn.
- Mixed Charsets, collations, database engines - just make sure all your stuff matches as much as possible.
- LIKE - The like clause can be extremely helpful, but it also throws out the index if you start with a wildcard. LIKE will, however, use the index if it starts with anything but a wildcard.
Thats it for now. I'll probably make smaller posts over time that show of some specific, smaller techniques.