Group By Operator
分组聚合, 常见的属性
aggregations、分组是为了哪个聚合函数mode , 一般是hash,对keys计算hashkeys 当没有keys属性时只有一个分组。outputColumnNames 输出的临时列名
举个例子
explain select sum(sal) from tb_emp;
查看其Group By Operator
+---------------------------------------------------------------------------------------------+|Explain|+---------------------------------------------------------------------------------------------+| Group By Operator ||aggregations: sum(sal)||mode: hash ||outputColumnNames: _col0 ||Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE|+---------------------------------------------------------------------------------------------+
再比如
explain select deptno,sum(sal) from tb_emp group by deptno;
查看其Group By Operator
+------------------------------------------------------------------------------------------------+|Explain |+------------------------------------------------------------------------------------------------+| Group By Operator||aggregations: sum(sal) ||keys: deptno (type: int)||mode: hash||outputColumnNames: _col0, _col1 ||Statistics: Num rows: 89 Data size: 718 Basic stats: COMPLETE Column stats: NONE|+------------------------------------------------------------------------------------------------+
group by执行原理
Group By任务转化为MR任务的流程如下:
Map:生成键值对,以GROUP BY条件中的列作为Key,以聚集函数的结果作为ValueShuffle:根据Key的值进行 Hash,按照Hash值将键值对发送至不同的Reducer中Reduce:根据SELECT子句的列以及聚集函数进行Reduce
总结
Group By Operator
大致有四个属性当一个查询没有用group by,
也可以有Group By Operator
,相当于是整个数据集是一个组, 或者说没有keys参考
Hive执行计划分析之group by执行计划分析_进击的数据小白-CSDN博客