DBHammer DaSE, ECNU

OceanBase连接顺序选择的优劣评估

一、目的

在OLAP场景中,多表连接是十分常见的,查询的执行效率跟它涉及的表的连接顺序息息相关。以A、B、C三张表为例,有一条查询:SELECT * FROM A, B, C WHERE …,那么这三张表的连接顺序可以是(A⋈B)⋈C(A⋈B)⋈C(A⋈C)⋈B等共6种连接顺序,我们将全部连接顺序称为搜索空间。不同的连接顺序是语义等价的,即能获得相同的结果集,但是对于查询效率有着非常大的影响。从搜索空间中选出性能最优的连接顺序是一个关键的DBMS优化问题,但是随着连接表数量的增加,搜索空间的大小呈指数级增长,这使得连接顺序选择成为一个NP-hard问题。本文主要评测OceanBase连接顺序选择策略的优劣,以分析OceanBase对多表连接查询的处理能力以及优化空间。

二、OceanBase查询计划分析

为了得到OceanBase执行某查询时选择的连接顺序,我们需要分析该查询的执行计划,具体方法如下。

2.1 EXPLAIN关键字

OceanBase可以利用EXPLAIN关键字得到查询的执行计划。

2.2 查询执行树

image

三、实验设计

3.1 实验流程

image

3.2 评价指标

四、实验配置

4.1 机器配置

4.3 实验数据

4.4 注意点

五、实验结果展示与分析

5.1 实验结果

5.2 案例分析

从图1中,我们可以看到有不少偏差较大的离散点,下面我们对其中两个点进行分析,探究偏差产生的原因。

5.2.1 案例一

| ======================================================
|ID|OPERATOR               |NAME    |EST. ROWS|COST  |
------------------------------------------------------
|0 |SCALAR GROUP BY        |        |1        |923737|
|1 | HASH JOIN             |        |353299   |856253|
|2 |  PX COORDINATOR       |        |7804     |19664 |
|3 |   EXCHANGE OUT DISTR  |:EX10000|7804     |18925 |
|4 |    TABLE SCAN         |table_9 |7804     |18925 |
|5 |  HASH JOIN            |        |32468    |606413|
|6 |   PX COORDINATOR      |        |710      |1777  |
|7 |    EXCHANGE OUT DISTR |:EX20000|710      |1710  |
|8 |     TABLE SCAN        |table_0 |710      |1710  |
|9 |   HASH JOIN           |        |32796    |573707|
|10|    TABLE SCAN         |table_4 |33127    |274978|
|11|    PX COORDINATOR     |        |82996    |203567|
|12|     EXCHANGE OUT DISTR|:EX30000|82996    |195711|
|13|      TABLE SCAN       |table_5 |82996    |195711|
====================================================== 

5.2.2 案例二

    select count(*) as result from table_2, table_14, table_10  
    	where table_2.col_4 < 1702600163 and table_14.col_1 <= 39586856.6599801245771715  
    	and table_10.col_8 > -633452491.72604654429895750  
    	and table_2.fk_0 = table_14.primaryKey  
    	and table_10.fk_0 = table_14.primaryKey;
obclient> EXPLAIN select count(*) as result from table_2, table_14, table_10 
	where table_2.col_4 < 1702600163 and table_14.col_1 <= 39586856.6599801245771715 
	and table_10.col_8 > -633452491.72604654429895750 
	and table_2.fk_0 = table_14.primaryKey 
	and table_10.fk_0 = table_14.primaryKey;

| ======================================================
|ID|OPERATOR             |NAME    |EST. ROWS|COST    |
------------------------------------------------------
|0 |SCALAR GROUP BY      |        |1        |12360781|
|1 | HASH JOIN           |        |14725630 |9548003 |
|2 |  PX COORDINATOR     |        |44555    |414608  |
|3 |   EXCHANGE OUT DISTR|:EX10000|44555    |406173  |
|4 |    HASH JOIN        |        |44555    |406173  |
|5 |     TABLE SCAN      |table_14|356      |874     |
|6 |     TABLE SCAN      |table_2 |45005    |363648  |
|7 |  TABLE SCAN         |table_10|118626   |287577  |
======================================================
obclient> EXPLAIN select /*+LEADING(table_2, table_14, table_10)*/ count(*) as result from table_2, table_14, table_10 
	where table_2.col_4 < 1702600163 and table_14.col_1 <= 39586856.6599801245771715 
	and table_10.col_8 > -633452491.72604654429895750 
	and table_2.fk_0 = table_14.primaryKey 
	and table_10.fk_0 = table_14.primaryKey;

| ======================================================
|ID|OPERATOR             |NAME    |EST. ROWS|COST    |
------------------------------------------------------
|0 |SCALAR GROUP BY      |        |1        |12360781|
|1 | HASH JOIN           |        |14725630 |9548003 |
|2 |  PX COORDINATOR     |        |44555    |414608  |
|3 |   EXCHANGE OUT DISTR|:EX10000|44555    |406173  |
|4 |    HASH JOIN        |        |44555    |406173  |
|5 |     TABLE SCAN      |table_2 |45005    |363648  |
|6 |     TABLE SCAN      |table_14|356      |874     |
|7 |  TABLE SCAN         |table_10|118626   |287577  |
======================================================

六、总结

经过上述实验,我们将OceanBase在连接顺序选择上的表现作以下总结:

Previous post
Dike:面向分布式事务型数据库评测基准及工具介绍
Next post
OceanBase对分布式事务的支持能力评测与分析