数据库最左原则以及理解

本次的实验是基于Mysql8版本。首先在数据库中有一个表,其结构如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
mysql> show create table org;
+-------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+-------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| org | CREATE TABLE `org` (
`org_id` int(5) NOT NULL,
`org_name` varchar(255) DEFAULT NULL,
`org_parent_id` int(5) DEFAULT NULL,
PRIMARY KEY (`org_id`),
KEY `index_name` (`org_name`,`org_parent_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 |
+-------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

可以看到在这个表中,有一个主键org_id以及一个联合索引index_name。其他的并无特别之处。

最左原则

在Mysql里面,有一个最左原则,官网的介绍如下:

The name index is an index over the last_name and first_name columns. The index can be used for lookups in queries that specify values in a known range for combinations of last_name and first_name values. It can also be used for queries that specify just a last_name value because that column is a leftmost prefix of the index (as described later in this section). Therefore, the name index is used for lookups in the following queries。

既然是最左原则,那么尝试着写一条SQL如下:

1
2
3
4
5
6
7
8
mysql> explain SELECT * FROM org WHERE org_name='1' and  org_parent_id=1;
+----+-------------+-------+------------+------+---------------+------------+---------+-------------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------------+---------+-------------+------+----------+-------------+
| 1 | SIMPLE | org | NULL | ref | index_name | index_name | 773 | const,const | 1 | 100.00 | Using index |
+----+-------------+-------+------------+------+---------------+------------+---------+-------------+------+----------+-------------+
1 row in set, 1 warning (0.01 sec)

可以看到其type是 ref 类型的,于是去mysql官网寻找ref类型的索引是什么:

All rows with matching index values are read from this table for each combination of rows from the previous tables. ref is used if the join uses only a leftmost prefix of the key or if the key is not a PRIMARY KEY or UNIQUE index (in other words, if the join cannot select a single row based on the key value). If the key that is used matches only a few rows, this is a good join type.

啥意思呢,就是当使用的索引符合最左原则的时候,且索引即不是主键也不是唯一索引。那么它所使用的类型就是ref

那如果将org_nameorg_parant_id反着呢?

1
2
3
4
5
6
7
mysql> explain SELECT * FROM org WHERE org_parent_id =1  and org_name='1';
+----+-------------+-------+------------+------+---------------+------------+---------+-------------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------------+---------+-------------+------+----------+-------------+
| 1 | SIMPLE | org | NULL | ref | index_name | index_name | 773 | const,const | 1 | 100.00 | Using index |
+----+-------------+-------+------------+------+---------------+------------+---------+-------------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

说好的最左原则呢。。。索引的org_parent_id不是在右边的嘛。

其实在这种情况下,无论org_parent_id在左边还是在右边,对于查询结果来说,都是一样的。既然都是一样的,那么Mysql的优化器就直接把该条语句给优化掉了,所以你会发现无论是org_name在左还是在右边都是会使用到索引。

那再换一条SQL呢?

1
2
3
4
5
6
7
8
mysql> explain SELECT * FROM org WHERE org_name>'1' and  org_parent_id=1;
+----+-------------+-------+------------+-------+---------------+------------+---------+------+------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+------------+---------+------+------+----------+--------------------------+
| 1 | SIMPLE | org | NULL | index | index_name | index_name | 773 | NULL | 9 | 11.11 | Using where; Using index |
+----+-------------+-------+------------+-------+---------------+------------+---------+------+------+----------+--------------------------+
1 row in set, 1 warning (0.00 sec)

这就尴尬了…由于在最左原则里面,若前一个索引使用了><等比较符的时候,后面一个是不会进行索引查询的,这个跟Mysql的索引结构有关系(文章后面会介绍),但是为啥这里又会走索引呢? 这个时候由于在Extra里面看到了Using index,猜测难道是发生了索引覆盖? 于是我又建立了一个表。
于是立即又新建了一张表

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
mysql> show create table org_copy;
+----------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+----------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| org_copy | CREATE TABLE `org_copy` (
`org_id` int(5) NOT NULL,
`org_name` varchar(255) DEFAULT NULL,
`org_parent_id` int(5) DEFAULT NULL,
`org_copy` varchar(255) DEFAULT NULL,
PRIMARY KEY (`org_id`),
KEY `index_name` (`org_name`,`org_parent_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 |
+----------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

这时,我再次执行刚才的sql

1
2
3
4
5
6
7
8
mysql> explain SELECT * FROM org_copy WHERE org_name>'一级部门' and  org_parent_id=1;
+----+-------------+----------+------------+-------+---------------+------------+---------+------+------+----------+-----------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------+------------+-------+---------------+------------+---------+------+------+----------+-----------------------+
| 1 | SIMPLE | org_copy | NULL | range | index_name | index_name | 768 | NULL | 1 | 11.11 | Using index condition |
+----+-------------+----------+------------+-------+---------------+------------+---------+------+------+----------+-----------------------+
1 row in set, 1 warning (0.00 sec)

看起来的确是由于发生了索引覆盖,导致了虽然不符合最左原则,但是还是走了一个索引。

索引覆盖

什么是索引覆盖呢?
org这个表里面,有三个字段,但是这三个字段分别是一个主键索引和一个联合索引,由于我查询的字段就包含在索引里面,那么
而恰好这个表的所有字段都在索引里面,这就导致了mysql可以直接从索引里面获取到所需要的数据,那么此时就不必要再去通过磁盘IO去查询额外的字段数据了。也就不需要进行

继续回到最左原则,由于最左原则要求在聚合索引里面,每一个列是要为一个等值连接。即,如果org_name使用了非等值连接,那么就会导致org_parent_id无法使用最左原则。例子如下:若org_parent_id使用非等值连接,为了避免索引覆盖SQL的影响,所以此次的SQL在 copy表里面执行。

1
2
3
4
5
6
7
8
mysql> explain SELECT * FROM org_copy WHERE org_name='1' and  org_parent_id>1;
+----+-------------+----------+------------+-------+---------------+------------+---------+------+------+----------+-----------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------+------------+-------+---------------+------------+---------+------+------+----------+-----------------------+
| 1 | SIMPLE | org_copy | NULL | range | index_name | index_name | 773 | NULL | 1 | 100.00 | Using index condition |
+----+-------------+----------+------------+-------+---------------+------------+---------+------+------+----------+-----------------------+
1 row in set, 1 warning (0.00 sec)

可以看到确实是使用了聚合索引,同时最后一列的 Using index condition也表明,这条查询语句会先通过索引来过滤出符合的数据,然后从过滤出来的数据里面在使用我们的where条件进行二次过滤,最终找出符合的数据。

解释

在Mysql里面,索引是以B+树的形式实现的,而索引又分为一级索引(主键)和二级索引(该文章中的index_name),其实在InnoDB里面,二级索引的叶子节点所存放的数据就是主键索引所对应的地址,这也称之为回表
回到文章中来,在这篇文章中介绍的index_name索引里面,mysql是根据索引列的顺序,一个一个在索引里面进行查找过滤,如果该索引在某一列断开了,例如索引A的列是(A、B、C),但是在SQL里面却是where A ='a' and C='c',此时就会只使用该索引AA列,这个跟B+树的特点有关。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
mysql> show create table index_test;
+------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| index_test | CREATE TABLE `index_test` (
`a` int(5) NOT NULL,
`b` varchar(255) DEFAULT NULL,
`c` int(5) DEFAULT NULL,
`d` varchar(255) DEFAULT NULL,
`f` varchar(255) DEFAULT NULL,
PRIMARY KEY (`a`),
KEY `index_name` (`b`,`c`,`d`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 |
+------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

mysql> EXPLAIN SELECT * FROM `index_test` WHERE b='b' and d='d';
+----+-------------+------------+------------+------+---------------+------------+---------+-------+------+----------+-----------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------+------------+------+---------------+------------+---------+-------+------+----------+-----------------------+
| 1 | SIMPLE | index_test | NULL | ref | index_name | index_name | 768 | const | 1 | 50.00 | Using index condition |
+----+-------------+------------+------------+------+---------------+------------+---------+-------+------+----------+-----------------------+
1 row in set, 1 warning (0.00 sec)


可以看到ref就一个const,所以基本可以推测目前只用了b列,那么换成WHERE b='b' and c='c'呢?

1
2
3
4
5
6
7
8
mysql> EXPLAIN SELECT * FROM `index_test` WHERE b='b' and c='c';
+----+-------------+------------+------------+------+---------------+------------+---------+-------------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------+------------+------+---------------+------------+---------+-------------+------+----------+-------+
| 1 | SIMPLE | index_test | NULL | ref | index_name | index_name | 773 | const,const | 1 | 100.00 | NULL |
+----+-------------+------------+------------+------+---------------+------------+---------+-------------+------+----------+-------+
1 row in set, 1 warning (0.00 sec)

所有对于使用最左原则来进行调优的话,首先应该注意的是索引列最好都是等值连接,并且中途最好不要有任何的断裂。这样才能发挥出联合索引的优势

作者

Somersames

发布于

2019-06-04

更新于

2021-12-05

许可协议

评论