查询优化

譬如在我们的 employees 表中,有大概三十万行的数据 :

+--------+------------+------------+--------------+--------+------------+
| emp_no | birth_date | first_name | last_name | gender | hire_date |
+--------+------------+------------+--------------+--------+------------+
| 10001 | 1953-09-02 | Georgi | Facello | M | 1986-06-26 |
...

我们想找到 first_name 为 Chirstian 的结果,下面比较一下加上索引前后查询所耗费的时间,从结果我们可以看出使用索引可以使查询效率提高 20 倍。

mysql> select * from employees where first_name = 'Chirstian';
mysql> alter table employees add index first_name (first_name);
mysql> select * from employees where first_name = 'Chirstian';
mysql> SHOW PROFILES;
+----------+------------+---------------------------------------------------------+
| Query_ID | Duration | Query |
+----------+------------+---------------------------------------------------------+
| 1 | 0.17415400 | select * from employees where first_name = 'Chirstian' |
| 2 | 1.03130100 | alter table employees add index first_name (first_name) |
| 3 | 0.00869100 | select * from employees where first_name = 'Chirstian' |
+----------+------------+---------------------------------------------------------+

最左前缀匹配原则

还拿前面的 employees 表举例,比如我们建立一个(birth_date, first_name, last_name ) 的组合索引。

mysql> alter table employees add index bd_fn_ln (birth_date, first_name, last_name);

下面的查询是用到索引的:

mysql> select * from employees where birth_date = '1954-05-01' and first_name = 'Chirstian' and last_name = 'Koblick';
mysql> select * from employees where birth_date = '1954-05-01' and first_name = 'Chirstian';
mysql> select * from employees where birth_date = '1954-05-01' and last_name = 'Koblick';

下面是这三个查询 explain 结果。

mysql> explain select * from employees where birth_date = '1954-05-01' and first_name = 'Chirstian' and last_name = 'Koblick';
+----+-------------+-----------+------+---------------+----------+---------+-------------------+------+-----------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+------+---------------+----------+---------+-------------------+------+-----------------------+
| 1 | SIMPLE | employees | ref | bd_fn_ln | bd_fn_ln | 97 | const,const,const | 1 | Using index condition |
+----+-------------+-----------+------+---------------+----------+---------+-------------------+------+-----------------------+
1 row in set (0.00 sec)
mysql> explain select * from employees where birth_date = '1954-05-01' and first_name = 'Chirstian' ;
+----+-------------+-----------+------+---------------+----------+---------+-------------+------+-----------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+------+---------------+----------+---------+-------------+------+-----------------------+
| 1 | SIMPLE | employees | ref | bd_fn_ln | bd_fn_ln | 47 | const,const | 1 | Using index condition |
+----+-------------+-----------+------+---------------+----------+---------+-------------+------+-----------------------+
1 row in set (0.01 sec)
mysql> explain select * from employees where birth_date = '1954-05-01' and last_name = 'Koblick';
+----+-------------+-----------+------+---------------+----------+---------+-------+------+-----------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+------+---------------+----------+---------+-------+------+-----------------------+
| 1 | SIMPLE | employees | ref | bd_fn_ln | bd_fn_ln | 3 | const | 60 | Using index condition |
+----+-------------+-----------+------+---------------+----------+---------+-------+------+-----------------------+
1 row in set (0.00 sec)

虽然结果都是一条,不过前两个查询都用到了联合索引。最后一个只用到了birth_date这一个索引,所以会在birth_date = 1954-05-01 的 60 结果中遍历last_name来找到等于Koblick的结果。还有, 如果 where 中都是精确匹配(使用'='号),那它们的顺序不会影响索引的使用。

而下面这个查询因为没用到组合索引的最左列,所以不会用到索引而是遍历了所有的数据,这就是最左前缀匹配:

mysql> select * from employees where first_name = 'Chirstian' and last_name = 'Koblick';
+--------+------------+------------+-----------+--------+------------+
| emp_no | birth_date | first_name | last_name | gender | hire_date |
+--------+------------+------------+-----------+--------+------------+
| 10004 | 1954-05-01 | Chirstian | Koblick | M | 1986-12-01 |
+--------+------------+------------+-----------+--------+------------+
1 row in set (0.18 sec)
mysql> explain select * from employees where first_name = 'Chirstian' and last_name = 'Koblick';
+----+-------------+-----------+------+---------------+------+---------+------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+------+---------------+------+---------+------+--------+-------------+
| 1 | SIMPLE | employees | ALL | NULL | NULL | NULL | NULL | 299468 | Using where |
+----+-------------+-----------+------+---------------+------+---------+------+--------+-------------+
1 row in set (0.00 sec)

选择区分度高的列作为索引

区分度:count(distinct col)/count(*)。 区分度是一个介于 0 和 1 之间的小数,越接近 1 区分度越高,越适合做索引。 原因很容易理解,比如一个辞典中全是以 a 和 b 开头的单词,那么按照首字母简历一个目录(索引),那么目录上一共就两条,每条的范围对应差不多半本辞典,那这个目录(索引)毫无用处。相反,一个班级的学生信息以学号做索引,那么区分度为 1,只要找到学号就能直接找到相对应的学生信息,这个索引就非常有效。

不要在比较运算符左侧使用函数或进行计算

在 sql 语句的比较运算符左侧使用函数或进行计算会使索引失效。

mysql> explain select * from employees where emp_no + 1 = 10005;
+----+-------------+-----------+------+---------------+------+---------+------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+------+---------------+------+---------+------+--------+-------------+
| 1 | SIMPLE | employees | ALL | NULL | NULL | NULL | NULL | 299468 | Using where |
+----+-------------+-----------+------+---------------+------+---------+------+--------+-------------+
1 row in set (0.11 sec)
mysql> explain select * from employees where emp_no = 10005-1;
+----+-------------+-----------+-------+---------------+---------+---------+-------+------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+-------+---------------+---------+---------+-------+------+-------+
| 1 | SIMPLE | employees | const | PRIMARY | PRIMARY | 4 | const | 1 | NULL |
+----+-------------+-----------+-------+---------------+---------+---------+-------+------+-------+
1 row in set (0.00 sec)