1200字范文 > 8 查询优化-关联查询优化-子查询优化-Order by 关键字优化-Group by 关键字优化-

8 查询优化-关联查询优化-子查询优化-Order by 关键字优化-Group by 关键字优化-

时间：2024-08-11 07:15:08

8、查询优化

8.1、准备数据

8.1.1、建表

CREATE TABLE `dept` (`id` INT ( 11 ) NOT NULL AUTO_INCREMENT,`deptName` VARCHAR ( 30 ) DEFAULT NULL,`address` VARCHAR ( 40 ) DEFAULT NULL,`ceo` INT NULL,PRIMARY KEY ( `id` ) ) ENGINE = INNODB AUTO_INCREMENT = 1 DEFAULT CHARSET = utf8;CREATE TABLE `emp` (`id` INT ( 11 ) NOT NULL AUTO_INCREMENT,`empno` INT NOT NULL,`name` VARCHAR ( 20 ) DEFAULT NULL,`age` INT ( 3 ) DEFAULT NULL,`deptId` INT ( 11 ) DEFAULT NULL,PRIMARY KEY ( `id` ) #CONSTRAINT `fk_dept_id` FOREIGN KEY (`deptId`) REFERENCES `t_dept` (`id`)) ENGINE = INNODB AUTO_INCREMENT = 1 DEFAULT CHARSET = utf8;

8.1.2、设置参数允许创建函数

# 创建函数，假如报错：This function has none of DETERMINISTIC......# 由于开启过慢查询日志 bin-log, 我们就必须为我们的 function 指定一个参数。# 主从复制，主机会将写操作记录在 bin-log 日志中。从机读取 bin-log 日志，执行语句来同步数据。# 如果使用函数来操作数据，会导致从机和主机操作时间不一致。所以，默认情况下，mysql不开启创建函数设置# 查看mysql是否允许创建函数：show variables like 'log_bin_trust_function_creators';# 命令开启：允许创建函数设置：set global log_bin_trust_function_creators=1; # 不加global只是当前窗口有效。mysqld重启，上述参数又会消失。# 永久方法： # linux下：/etc/f下f[mysqld]加上：log_bin_trust_function_creators=1

8.1.3、创建函数

# 产生随机字符串DELIMITER $$CREATE FUNCTION rand_string ( n INT ) RETURNS VARCHAR ( 255 ) BEGINDECLARE chars_str VARCHAR ( 100 ) DEFAULT 'abcdefghijklmnopqrstuvwxyzABCDEFJHIJKLMNOPQRSTUVWXYZ';DECLARE return_str VARCHAR ( 255 ) DEFAULT '';DECLARE i INT DEFAULT 0;WHILE i < n DOSET return_str = CONCAT(return_str,SUBSTRING( chars_str, FLOOR( 1+RAND ()* 52 ), 1 ));SET i = i + 1;END WHILE;RETURN return_str;END $$ #假如要删除DELIMITER;DROP FUNCTION rand_string;

随机产生部门编号

# 用于随机产生多少到多少的编号DELIMITER $$CREATE FUNCTION rand_num ( from_num INT, to_num INT ) RETURNS INT ( 11 ) BEGINDECLARE i INT DEFAULT 0;SET i = FLOOR(from_num + RAND()*(to_num - from_num + 1 ));RETURN i;END $$ # 假如要删除DELIMITER;DROP FUNCTION rand_num;

8.1.4、创建存储过程

创建往 emp 表中插入数据的存储过程

DELIMITER $$CREATE PROCEDURE insert_emp ( `start` INT, max_num INT ) BEGINDECLARE i INT DEFAULT 0;SET autocommit = 0; # 设置手动提交事务REPEAT# 循环SET i = i + 1;# 赋值INSERT INTO emp ( empno, NAME, age, deptid )VALUES((`start` + i), rand_string ( 6 ), rand_num ( 30, 50 ), rand_num ( 1, 10000 ));UNTIL i = max_num END REPEAT;COMMIT;# 提交事务END $$ # 删除DELIMITER;DROP PROCEDURE insert_emp;

创建往 dept 表中插入数据的存储过程

# 执行存储过程，往 dept 表添加随机数据DELIMITER $$CREATE PROCEDURE `insert_dept` ( max_num INT ) BEGINDECLARE i INT DEFAULT 0;SET autocommit = 0;REPEATSET i = i + 1;INSERT INTO dept ( deptname, address, ceo )VALUES(rand_string ( 8 ), rand_string ( 10 ), rand_num ( 1, 500000 ));UNTIL i = max_num END REPEAT;COMMIT;END $$ # 删除DELIMITER;DROP PROCEDURE insert_dept;

8.1.5、向表中插入数据

# 执行存储过程，往 dept 表添加1万条数据CALL insert_dept(10000); # 执行存储过程，往 emp 表添加50万条数据CALL insert_emp(100000,500000);

8.1.6、批量删除某个表上的所有索引

DELIMITER $$CREATE PROCEDURE `proc_drop_index` (dbname VARCHAR ( 200 ), tablename VARCHAR ( 200 )) BEGINDECLARE done INT DEFAULT 0;DECLARE ct INT DEFAULT 0;DECLARE _index VARCHAR ( 200 ) DEFAULT '';DECLARE _cur CURSOR FOR SELECT index_name FROM information_schema.STATISTICS WHERE table_schema = dbname AND table_name = tablename AND seq_in_index = 1 AND index_name <> 'PRIMARY';#每个游标必须使用不同的declare continue handler for not found set done=1来控制游标的结束DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = 2; # 若没有数据返回,程序继续,并将变量 done 设为 2OPEN _cur;FETCH _cur INTO _index;WHILE _index <> '' DOSET @str = CONCAT( "drop index ", _index, " on ", tablename );PREPARE sql_str FROM @str;EXECUTE sql_str;DEALLOCATE PREPARE sql_str;SET _index = '';FETCH _cur INTO _index;END WHILE;CLOSE _cur;END $$# 执行存储过程CALL proc_drop_index("tbname", "emp");CALL proc_drop_index("tbname", "dept");

8.2、单值索引建立及常见索引失效

8.2.1、全值匹配

最常见的 sql 类型：全值匹配

explain select sql_no_cache * from emp where emp.age=30 and deptid=4 and emp.name = 'abcd';

这时我们的索引应该如何建立？

create index idx_age on emp(age);create index idx_age_deptid on emp(age,deptid);# 包括上面的索引create index idx_age_deptid_name on emp(age,deptid,NAME);# 包括上面两条索引，但不包括下面索引create index idx_deptId_age on emp(deptId, age);

8.2.2、最左前缀法则

# 删除所有索引create index idx_age_deptId_name on emp(age, deptId, `name`);

注：如果第一个字段都跳过了，那么整个复合索引都用不到了

8.2.3、计算函数导致索引失效

# 创建索引create index idx_name on emp(`name`);# 索引生效explain select sql_no_cache * from emp where emp.name like 'abc%';# 索引失效explain select sql_no_cache * from emp where left(emp.name,3) = 'abc';

8.2.4、范围条件导致复合索引部分失效

# 创建索引create index idx_age_deptId_name on emp(age, deptId, `name`);# deptId 右边的列失效（不包括 deptId）explain select sql_no_cache * from emp where emp.age=30 and emp.deptId>20 and emp.name = 'abc' ;

解决方法：

重建索引，将涉及到范围条件的列放在最后；

# 创建索引create index idx_age_name_deptId on emp(age, `name`, deptId);

)]

8.2.5、不等于(!= 或者<>)索引失效

explain select sql_no_cache * from emp where emp.name != 'abc';

8.2.6、is not null 无法使用索引，is null 可使用索引

# 创建索引create index idx_age on emp(age);# 索引生效explain select sql_no_cache * from emp where age is null;# 索引失效explain select sql_no_cache * from emp where age is not null;

8.2.7、like 以通配符 % 开头索引失效

# 创建索引create index idx_name on emp(`name`);# 索引生效explain select sql_no_cache * from emp where name like 'ab%';# 索引失效explain select sql_no_cache * from emp where name like '%ab%';

8.2.8、类型转换导致索引失效

# 创建索引create index idx_name on emp(`name`);# 索引生效explain select sql_no_cache * from emp where name = '123';# 索引失效explain select sql_no_cache * from emp where name = 123;

8.2.9、索引优化建议

对于单键索引，尽量选择过滤性更好的索引（例如：手机号，邮件，身份证）在选择组合索引的时候，过滤性最好的字段在索引字段顺序中，位置越靠前越好。选择组合索引时，尽量包含 where 中更多字段的索引组合索引出现范围查询时，尽量把这个字段放在索引次序的最后面尽量避免造成索引失效的情况

8.4、关联查询优化

8.4.1、建表

#分类create table if not exists `class` (`id` int(10) unsigned not null auto_increment,`card` int(10) unsigned not null,primary key (`id`));#图书create table if not exists `book` (`bookid` int(10) unsigned not null auto_increment,`card` int(10) unsigned not null,primary key (`bookid`));insert into class(card) values(floor(1 + (rand() * 20)));insert into class(card) values(floor(1 + (rand() * 20)));insert into class(card) values(floor(1 + (rand() * 20)));insert into class(card) values(floor(1 + (rand() * 20)));insert into class(card) values(floor(1 + (rand() * 20)));insert into class(card) values(floor(1 + (rand() * 20)));insert into class(card) values(floor(1 + (rand() * 20)));insert into class(card) values(floor(1 + (rand() * 20)));insert into class(card) values(floor(1 + (rand() * 20)));insert into class(card) values(floor(1 + (rand() * 20)));insert into class(card) values(floor(1 + (rand() * 20)));insert into class(card) values(floor(1 + (rand() * 20)));insert into class(card) values(floor(1 + (rand() * 20)));insert into class(card) values(floor(1 + (rand() * 20)));insert into class(card) values(floor(1 + (rand() * 20)));insert into class(card) values(floor(1 + (rand() * 20)));insert into class(card) values(floor(1 + (rand() * 20)));insert into class(card) values(floor(1 + (rand() * 20)));insert into class(card) values(floor(1 + (rand() * 20)));insert into class(card) values(floor(1 + (rand() * 20)));insert into book(card) values(floor(1 + (rand() * 20)));insert into book(card) values(floor(1 + (rand() * 20)));insert into book(card) values(floor(1 + (rand() * 20)));insert into book(card) values(floor(1 + (rand() * 20)));insert into book(card) values(floor(1 + (rand() * 20)));insert into book(card) values(floor(1 + (rand() * 20)));insert into book(card) values(floor(1 + (rand() * 20)));insert into book(card) values(floor(1 + (rand() * 20)));insert into book(card) values(floor(1 + (rand() * 20)));insert into book(card) values(floor(1 + (rand() * 20)));insert into book(card) values(floor(1 + (rand() * 20)));insert into book(card) values(floor(1 + (rand() * 20)));insert into book(card) values(floor(1 + (rand() * 20)));insert into book(card) values(floor(1 + (rand() * 20)));insert into book(card) values(floor(1 + (rand() * 20)));insert into book(card) values(floor(1 + (rand() * 20)));insert into book(card) values(floor(1 + (rand() * 20)));insert into book(card) values(floor(1 + (rand() * 20)));insert into book(card) values(floor(1 + (rand() * 20)));insert into book(card) values(floor(1 + (rand() * 20)));

8.4.2、左连接查询优化（left join）

# 添加索引优化create index idx_book_card on book(card);# 【右表被驱动表】，可以避免全表扫描

# 添加索引优化create index idx_class_card class(card);# 【左表驱动表】，无法避免全表扫描

8.4.3、内连接自动选择驱动表（inner join）

8.4.4、索引优化建议

保证被驱动表的 join 字段被索引left join 时，选择小表作为驱动表，大表作为被驱动表inner join 时，mysql 会自动将小结果集的表选为驱动表。选择相信 mysql 优化策略。能够直接多表关联的尽量直接关联，不用子查询

8.5、子查询优化

优化思路：

将子查询转换成连接查询后，建立相关索引

# 先创建一个索引create index idx_ceo on dept (ceo);# ①不推荐explain select * from emp a where a.id not in (select b.ceo from dept b where b.ceo is not null);# ②推荐explain select a.* from emp a left join dept b on a.id = b.ceowhere b.id is null;

注：不要使用not in或者not exists

8.6、分组、排序优化

8.6.1、没有 limit 条件，不会使用索引

# 创建索引create index idx_age_deptId_name on emp (age, deptId, `name`);# 不使用索引explain select sql_no_cache * from emp order by age,deptId;# 使用索引explain select sql_no_cache * from emp order by age,deptId limit 10;

8.6.2、在使用 where 条件时，需要注意索引顺序

在 where 和 order 同时存在的情况下，需要注意索引的顺序

# 创建索引create index idx_emp_age_deptId_empno on emp (age, deptId, empno);

8.6.3、order 上出现两个方向相反的字段，会出现 Using file sort

在 order 上出现两个方向相反的字段，会出现 Using file sort，文件排序，需要全盘扫描排序

# 创建索引create index idx_age_deptId_name on emp (age, deptId, `name`);

8.6、范围条件和排序同时存在的优化方案

8.6.1、待优化的 sql

# 删除所有索引（除主键外）call proc_drop_index('test','emp');# 需要优化的 sqlexplain select sql_no_cache * from emp where age =30 and empno < 101000 order by `name`;

8.6.2、分析三个字段复合索引

首先有三个字段，所以我们考虑三个字段的复合索引（age, empno, name）因为 where 中出现了范围条件，所以我们考虑把 empno 放在最后；最后三个字段的复合索引顺序为 age, name, empno；

8.6.3、分析两个字段的复合索引

因为三个字段的复合索引中 empno 无用，所以我们考虑两个字段的复合索引（age, name 或者 age, empno）；

8.6.4、总结

当范围条件和排序同时存在时，索引的建立只能二选其一；所有的排序都是在条件过滤之后才执行的，所以，如果条件过滤掉大部分数据的话，剩下几百几千条数据进行排序其实并不是很消耗性能，即使索引优化了排序，但实际提升性能很有限。相对的 empno<101000 这个条件，如果没有用到索引的话，要对几万条的数据进行扫描，这是非常消耗性能的，所以索引放在这个字段上性价比最高，是最优选择。当【范围条件】和【group by 或者 order by】的字段出现二选一时，优先观察条件字段的过滤数量，如果过滤的数据足够多，而需要排序的数据并不多时，优先把索引放在范围字段上。反之，亦然。

8.7、双路排序和单路排序（理论）

如果排序不在索引列上，就会使用 filesort 排序：mysql 就会使用双路排序和单路排序；

双路排序（又叫回表排序模式）：先根据相应的条件取出相应的排序字段和可以直接定位行数据的行 ID，然后在 sort buffer 中进行排序，排序完后需要再次取回其它需要的字段；MySQL 4.1 之前使用的就是双路排序；

单路排序：是一次性取出满足条件行的所有字段，然后在sort buffer中进行排序；

8.7.1、双路排序（慢）

双路排序需要两次扫描磁盘第一次扫描磁盘，根据过滤条件（where）筛选出满足条件的行，将所有满足条件的行 ID（可以根据行 ID 寻址到这一行上的所有数据）和 order 中的字段绑定放到 sort buffer 中，对其进行排序第二次扫描磁盘，MySQL 根据排序好的 ID，再次扫描表，按顺序取出所有数据；

举个例子：

select * from user where age = 30 order by name；

MySQL 根据where age = 30将所有满足age = 30的行 ID 和name字段绑定，放入 sort buffer 中；在 sort buffer 中根据name字段排序；根据排序好的行 ID，寻址到每一行的所有数据；将数据返回给客户端；

8.7.2、单路排序（快）

一次性根据过滤条件将一整行数据全部加载到 sort buffer 中；在 sort buffer 中根据 order 中的字段对一整行进行排序；排序好之后直接返回给客户端；

注：单路排序虽然快，但是单路排序可能导致大量 I/O 操作。在 sort buffer 中单路要比双路排序占用很多空间，因为单路是把所有字段都取出, 所以有可能取出的数据的总大小超出了 sort_buffer 的容量，导致每次只能取 sort_buffer 容量大小的数据，进行排序（创建 tmp 文件，多路合并），排完再取 sort_buffer 容量大小，再排.....从而多次I/O。

8.8、Order by 关键字优化

Order by 时 select * 是一个大忌。只 Query 需要的字段，这点非常重要。在这里的影响是：

当 Query 的字段大小总和小于 max_length_for_sort_data 而且排序字段不是 TEXT|BLOB 类型时，会用改进后的算法——单路排序，否则用老算法——多路排序。两种算法的数据都有可能超出 sort_buffer 的容量，超出之后，会创建 tmp 文件进行合并排序，导致多次 I/O，但是用单路排序算法的风险会更大一些，所以要提高 sort_buffer_size。

尝试提高 sort_buffer_size

不管用哪种算法，提高这个参数都会提高效率，当然，要根据系统的能力去提高，因为这个参数是针对每个进程（connection）的 1M-8M 之间调整。 MySQL5.7，InnoDB 存储引擎默认值是 1048576 字节，1MB。

show variables like '%sort_buffer_size%';

尝试提高 max_length_for_sort_data

提高这个参数，会增加用改进算法的概率。

SHOW VARIABLES LIKE ‘%max_length_for_sort_data%’; #默认1024字节

但是如果设的太高，数据总容量超出 sort_buffer_size 的概率就增大，明显症状是高的磁盘 I/O 活动和低的处理器使用率。如果需要返回的列的总长度大于 max_length_for_sort_data，使用双路算法，否则使用单路算法。1024-8192 字节之间调整;

8.9、Group by 关键字优化

group by 使用索引的原则几乎跟order by一致，唯一区别：

group by 先排序再分组，遵照索引建的最佳左前缀法则当无法使用索引列，增大 max_length_for_sort_data 和 sort_buffer_size 参数的设置where 高于 having,能写在 where 限定的条件就不要写在 having 中了group by 没有过滤条件，也可以用上索引。Order By 必须有过滤条件才能使用上索引。

8 查询优化-关联查询优化-子查询优化-Order by 关键字优化-Group by 关键字优化-双路排序和单路排序

本内容不代表本网观点和政治立场，如有侵犯你的权益请联系我们处理。

网友评论

网友评论仅供其表达个人看法，并不表明网站立场。