关于MYSQL DML(UPDATE DELETE)中的子查询问题和ERROR 1093 (HY000)错误-白红宇

从5.6开始MYSQL的子查询进行了大量的优化，5.5中只有EXISTS strategy，在5.7中包含如下:

IN(=ANY)

--Semi-join

--table pullout(最快的,子查询条件为唯一键)

--first match

--semi-join materialization

--loosescan

--duplicateweedout

--Materialization

--EXISTS strategy(最慢的)

NOT IN( <>ALL)

--Materialization

--EXISTS strategy(最慢的)

而(not)exist却没有任何优化还是关联子查询的方式，这和ORACLE不一样，ORACLE中in、exists

都可以使用半连接(semi)优化.所以MYSQL中尽量使用in不要用exists。not in不能使用semi-join

要小心使用，更不要用not exists，关于上面每一个含义可以参考官方手册和mariadb手册。

我们简单的看一个列子，

使用semi-join materialization优化的

mysql> explain select * from testde1 where testde1.id in(select id from testde2);

+----+--------------+-------------+------------+------+---------------+------+---------+------+------+----------+----------------------------------------------------+

+----+--------------+-------------+------------+------+---------------+------+---------+------+------+----------+----------------------------------------------------+

+----+--------------+-------------+------------+------+---------------+------+---------+------+------+----------+----------------------------------------------------+

3 rows in set, 1 warning (0.00 sec)

Note (Code 1003): /* select#1 */ select `test`.`testde1`.`id` AS `id` from `test`.`testde1` semi join (`test`.`testde2`) where (`test`.`testde1`.`id` = ``.`id`)

semi join (`test`.`testde2`) 说明了问题

禁用semi join使用Materialization优化

mysql> set optimizer_switch='semijoin=off';

Query OK, 0 rows affected (0.00 sec)

mysql> explain select * from testde1 where testde1.id in(select id from testde2);

+----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+

+----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+

+----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+

2 rows in set, 1 warning (0.00 sec)

Note (Code 1003): /* select#1 */ select `test`.`testde1`.`id` AS `id` from `test`.`testde1` where (`test`.`testde1`.`id`,`test`.`testde1`.`id` in ( (/* select#2 */ select `test`.`testde2`.`id` from `test`.`testde2` where 1 ), (`test`.`testde1`.`id` in on where ((`test`.`testde1`.`id` = `materialized-subquery`.`id`)))))

materialized-subquery`.`id`)说明了问题

禁用join使用Materialization

ysql> set optimizer_switch='materialization=off';

Query OK, 0 rows affected (0.00 sec)

mysql> explain select * from testde1 where testde1.id in(select id from testde2);

+----+--------------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+

+----+--------------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+

+----+--------------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+

2 rows in set, 1 warning (0.00 sec)

Note (Code 1003): /* select#1 */ select `test`.`testde1`.`id` AS `id` from `test`.`testde1` where (`test`.`testde1`.`id`,(/* select#2 */ select 1 from `test`.`testde2` where ((`test`.`testde1`.`id`) = `test`.`testde2`.`id`)))

使用DEPENDENT SUBQUERY 关联子查询优化,这也是最慢的。这和

select * from testde1 where exists (select * from testde2 where testde1.id=testde2.id);的执行计划完全一致，

testde1大表必须作为驱动表

mysql> explain select * from testde1 where exists (select * from testde2 where testde1.id=testde2.id);

+----+--------------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+

+----+--------------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+

+----+--------------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+

2 rows in set, 2 warnings (0.00 sec)

Note (Code 1276): Field or reference 'test.testde1.id' of SELECT #2 was resolved in SELECT #1

Note (Code 1003): /* select#1 */ select `test`.`testde1`.`id` AS `id` from `test`.`testde1` where exists(/* select#2 */ select 1 from `test`.`testde2` where (`test`.`testde1`.`id` = `test`.`testde2`.`id`))

同时在官方文档也说明了在DML中的子查询用不到SEMI优化和Materialization优化，只能使用exists言外之意就是只能使用关联子查询，转换为exists的格式。

那么速度可想而知，这种方式明显是外层表取出一行，驱动内层表一次，顺序固定，而jion的时候一般会选取小表作为驱动表性能更好。所以建议我们使用join

的方式来删除

原文如下：

A limitation on UPDATE and DELETE statements that use a subquery to modify a

single table is that the optimizer does not use semi-join or materialization subquery

optimizations. As a workaround, try rewriting them as multiple-table UPDATEand

DELETEstatements that use a join rather than a subquery.

实际就是下面的执行计划：

mysql> explain delete from testde1 where id in (select id from testde2);

+----+--------------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+

+----+--------------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+

+----+--------------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+

2 rows in set (0.00 sec)

转换为了：

mysql> explain delete from testde1 where exists (select * from testde2 where testde1.id=testde2.id);

+----+--------------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+

+----+--------------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+

+----+--------------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+

2 rows in set, 1 warning (0.00 sec)

可以看完全一样

应该使用：

mysql> explain delete testde1 from testde1,testde2 where testde1.id=testde2.id;

+----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+

+----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+

+----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+

2 rows in set (0.00 sec)

这里我们看到小表testde2做了驱动表。

最后来说明一下这个报错：

mysql> delete from testde1 where id in(select testde1.id from testde1,testde2 where testde1.id=testde2.id );

ERROR 1093 (HY000): You can't specify target table 'testde1' for update in FROM clause

我们先不管他有没有意义，这个报错再手册上叫做ER_UPDATE_TABLE_USED，我们首先来分析一下这个报错

这样的delete会进行exists展开那么testde1既是修改条件的来源也是修改的对象，这样是不允许的。那么如何修改呢？

实际上就需要select testde1.id from testde1,testde2 where testde1.id=testde2.id 的结果保存在一个临时表中，

不要exists展开，手册中给出的方法是

方法一、建立一个algorithm=temptable 的视图

方法二、建立一个普通视图同时修改SET optimizer_switch = 'derived_merge=off';

其目的都在于不展开选取第二种方式测试：

mysql> create view myt1

-> as

-> select testde1.id from testde1,testde2 where testde1.id=testde2.id;

Query OK, 0 rows affected (0.02 sec)

mysql> delete from testde1 where id in (select * from myt1);

ERROR 1443 (HY000): The definition of table 'myt1' prevents operation DELETE on table 'testde1'.

mysql> SET optimizer_switch = 'derived_merge=off';

Query OK, 0 rows affected (0.00 sec)

mysql> delete from testde1 where id in (select * from myt1);

Query OK, 2 rows affected (0.00 sec)

mysql> flush status;

Query OK, 0 rows affected (0.02 sec)

mysql> delete from testde1 where id in (select * from myt1);

Query OK, 2 rows affected (0.03 sec)

mysql> show status like '%tmp%';

+-------------------------+-------+

| Variable_name | Value |

+-------------------------+-------+

| Created_tmp_disk_tables | 0 |

| Created_tmp_files | 0 |

| Created_tmp_tables | 2 |

+-------------------------+-------+

3 rows in set (0.01 sec)

看看执行计划：

mysql> explain delete from testde1 where id in (select * from myt1);

+----+--------------------+------------+------------+----------------+---------------+-------------+---------+------+------+----------+----------------------------------------------------+

+----+--------------------+------------+------------+----------------+---------------+-------------+---------+------+------+----------+----------------------------------------------------+

| 2 | DEPENDENT SUBQUERY | | NULL | index_subquery | | | 5 | func | 2 | 100.00 | Using index |

+----+--------------------+------------+------------+----------------+---------------+-------------+---------+------+------+----------+----------------------------------------------------+

4 rows in set (0.00 sec)

可以看到子查询作为了一个整体，从status和执行计划dervied都可以看到使用了临时表，这样可行，但是性能上肯定不好。

在ORACLE中不存在这样的问题，执行计划如下：

SQL> delete from testde1 where id in(select testde1.id from testde1,testde2 where testde1.id=testde2.id );

0 rows deleted.

Execution Plan

----------------------------------------------------------

Plan hash value: 2653154564

--------------------------------------------------------------------------------

---------------------------------------------------------------------------------

| 0 | DELETE STATEMENT | | 1 | 26 | 7 (15)| 00:00:01|

|* 2 | HASH JOIN SEMI | | 1 | 26 | 7 (15)| 00:00:01|

| 3 | TABLE ACCESS FULL | TESTDE1 | 5 | 65 | 2 (0)| 00:00:01|

| 4 | VIEW | VW_NSO_1 | 1 | 13 | 5 (20)| 00:00:01|

|* 5 | HASH JOIN | | 1 | 26 | 5 (20)| 00:00:01|

| 6 | TABLE ACCESS FULL| TESTDE2 | 1 | 13 | 2 (0)| 00:00:01|

| 7 | TABLE ACCESS FULL| TESTDE1 | 5 | 65 | 2 (0)| 00:00:01|

---------------------------------------------------------------------------------

先使用hash join将 TESTDE2 和 TESTDE1 建立为一个视图 VW_NSO_1，然后使用了 HASH JOIN SEMI的优化方式，明显用了到半连接优化

这也是为什么ORACLE比现在的MYSQL还是更加强劲的一个小例子，虽然都是作为一个整体，但是MYSQL已经用不到SEMI优化方式了，ORACLE

依然可以，但是可以预见不久的将来MYSQL肯定支持的。

最后总结一下：

1、.所以MYSQL中尽量使用in不要用exists。not in不能使用semi-join要小心使用，更不要用not exists

2、子查询DML应该修改关联DML(update delete)

3、ERROR 1093 (HY000)错误原因是某张表既是修改的对象也是信息来源的对象。需要使用algorithm=temptable或者

optimizer_switch = 'derived_merge=off'的方式。