1200字范文,内容丰富有趣,写作的好帮手!
1200字范文 > 大数据开发之Hive优化篇8-Hive Job优化

大数据开发之Hive优化篇8-Hive Job优化

时间:2018-08-07 14:11:09

相关推荐

大数据开发之Hive优化篇8-Hive Job优化

备注:

Hive 版本 2.1.1

文章目录

Hive job优化概述一.并行执行二.本地执行三.合并输入小文件四.合并输出小文件五.控制Map/Reduce数5.1 控制Hive job中的map数5.1.1 合并小文件,减小map数5.1.2 适当增加map数5.2 控制hive任务的reduce数参考

Hive job优化概述

实际开发过程中,经常会遇到hive sql运行比较慢的情况,这个时候查看job的信息,也是一直在运行,只是迟迟的不出结果。

可以从如下几个方面来优化hive sql的job:

并行执行

Hive产生的MR Job默认是顺序执行的,如果Job之间无依赖可以并行执行

set hive.exec.parallel=true;

本地执行

虽然Hive能够利用MR处理大规模数据,但某些场景下处理的数据量非常小可以本地执行,不必提交集群

相关参数:

set hive.exec.mode.local.auto=true;

hive.exec.mode.local.auto.inputbytes.max(默认128MB)

hive.exec.mode.local.auto.input.files.max(默认4)

合并输入小文件

如果Job输入有很多小文件,造成Map数太多,影响效率

set hive.input.format=org.apache.hadoop.hive.bineHiveInputFormat #执行Map前进行小文件合并

set mapred.max.split.size=256000000; #每个Map最大输入大小

set mapred.min.split.size.per.node=100000000; #一个节点上split的至少的大小

set mapred.min.split.size.per.rack=100000000; #一个交换机下split的至少的大小

合并输出小文件

set hive.merge.mapfiles=true; // map only job结束时合并小文件

set hive.merge.mapredfiles=true; // 合并reduce输出的小文件

set hive.merge.smallfiles.avgsize=256000000; //当输出文件平均大小小于该值,启动新job合并文件

set hive.merge.size.per.task=64000000; //合并之后的每个文件大小

控制Map/Reduce数

控制Map/Reduce数来控制Job执行的并行度

Num_Map_tasks= $inputsize/ max($mapred.min.split.size, min($dfs.block.size, $mapred.max.split.size))Num_Reduce_tasks= min($hive.exec.reducers.max, $inputsize/$hive.exec.reducers.bytes.per.reducer)

一.并行执行

Hive产生的MR Job默认是顺序执行的,如果Job之间无依赖可以并行执行

set hive.exec.parallel=true;

代码:

set hive.exec.parallel=false;select count(*) from ods_fact_sale_orcunion allselect count(*) from ods_fact_sale_partion;set hive.exec.parallel=true;set hive.exec.parallel.thread.number = 8; -- 默认并行度是8select count(*) from ods_fact_sale_orcunion allselect count(*) from ods_fact_sale_partion;

由于本地测试环境资源有限,无法并行执行mr任务,此处省略测试记录

二.本地执行

虽然Hive能够利用MR处理大规模数据,但某些场景下处理的数据量非常小可以本地执行,不必提交集群

相关参数:

set hive.exec.mode.local.auto=true;hive.exec.mode.local.auto.inputbytes.max(默认128MB)hive.exec.mode.local.auto.input.files.max(默认4)

代码:

set hive.exec.mode.local.auto=false;select * from emp where empno = 7369;set hive.exec.mode.local.auto=true; //开启本地mrset hive.exec.mode.local.auto.inputbytes.max=50000000; //设置local mr的最大输入数据量,当输入数据量小于这个值的时候会采用local mr的方式set hive.exec.mode.local.auto.tasks.max=10; //设置local mr的最大输入文件个数,当输入文件个数小于这个值的时候会采用local mr的方式set hive.exec.mode.local.auto.input.files.max = 50;select * from emp where empno = 7369;

测试记录:

可以看到一个简单的查询,提交集群15秒,本地执行不到4秒,性能大幅提升

hive> > set hive.exec.mode.local.auto=false;hive> select * from emp where empno = 7369;Query ID = root_0115095845_16ac6734-b0c2-457a-9c71-6c34655dd84eTotal jobs = 1Launching Job 1 out of 1Number of reduce tasks is set to 0 since there's no reduce operatorStarting Job = job_1610015767041_0038, Tracking URL = http://hp1:8088/proxy/application_1610015767041_0038/Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/hadoop job -kill job_1610015767041_0038Hadoop job information for Stage-1: number of mappers: 2; number of reducers: 0-01-15 09:58:53,165 Stage-1 map = 0%, reduce = 0%-01-15 09:58:59,419 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 6.78 secMapReduce Total cumulative CPU time: 6 seconds 780 msecEnded Job = job_1610015767041_0038MapReduce Jobs Launched: Stage-Stage-1: Map: 2 Cumulative CPU: 6.78 sec HDFS Read: 13532 HDFS Write: 232 HDFS EC Read: 0 SUCCESSTotal MapReduce CPU Time Spent: 6 seconds 780 msecOK7369 smith clerk 7902 1980-12-17800.00 NULL 20Time taken: 15.377 seconds, Fetched: 1 row(s)hive> > set hive.exec.mode.local.auto=true;hive> set hive.exec.mode.local.auto.inputbytes.max=50000000; hive> set hive.exec.mode.local.auto.tasks.max=10; hive> set hive.exec.mode.local.auto.input.files.max = 50;hive> select * from emp where empno = 7369;Automatically selecting local only mode for queryQuery ID = root_0115095924_96fedfe1-cd3e-4ec5-aea9-723ec60e416bTotal jobs = 1Launching Job 1 out of 1Number of reduce tasks is set to 0 since there's no reduce operator21/01/15 09:59:27 INFO mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-root/mapred/local/1610675964612/3.0.0-cdh6.3.1-mr-framework.tar.gz <- /root/mr-framework21/01/15 09:59:27 INFO mapred.LocalDistributedCacheManager: Localized hdfs://nameservice1/user/yarn/mapreduce/mr-framework/3.0.0-cdh6.3.1-mr-framework.tar.gz as file:/tmp/hadoop-root/mapred/local/1610675964612/3.0.0-cdh6.3.1-mr-framework.tar.gz21/01/15 09:59:27 INFO mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-root/mapred/local/1610675964613/libjars <- /root/libjars/*21/01/15 09:59:27 WARN mapred.LocalDistributedCacheManager: Failed to create symlink: /tmp/hadoop-root/mapred/local/1610675964613/libjars <- /root/libjars/*21/01/15 09:59:27 INFO mapred.LocalDistributedCacheManager: Localized file:/tmp/hadoop/mapred/staging/root1720005872/.staging/job_local1720005872_0002/libjars as file:/tmp/hadoop-root/mapred/local/1610675964613/libjarsJob running in-process (local Hadoop)21/01/15 09:59:27 INFO mapred.LocalJobRunner: OutputCommitter set in config org.apache.hadoop.hive.ql.io.HiveFileFormatUtils$NullOutputCommitter21/01/15 09:59:27 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.hive.ql.io.HiveFileFormatUtils$NullOutputCommitter21/01/15 09:59:27 INFO mapred.LocalJobRunner: Waiting for map tasks21/01/15 09:59:27 INFO mapred.LocalJobRunner: Starting task: attempt_local1720005872_0002_m_000000_021/01/15 09:59:27 INFO mapred.LocalJobRunner: 21/01/15 09:59:27 INFO mapred.LocalJobRunner: hdfs://nameservice1/user/hive/warehouse/test.db/emp/000000_0_copy_9:0+5321/01/15 09:59:27 INFO mapred.LocalJobRunner: Finishing task: attempt_local1720005872_0002_m_000000_021/01/15 09:59:27 INFO mapred.LocalJobRunner: Starting task: attempt_local1720005872_0002_m_000001_021/01/15 09:59:27 INFO mapred.LocalJobRunner: 21/01/15 09:59:27 INFO mapred.LocalJobRunner: hdfs://nameservice1/user/hive/warehouse/test.db/emp/000000_0_copy_8:0+4821/01/15 09:59:27 INFO mapred.LocalJobRunner: Finishing task: attempt_local1720005872_0002_m_000001_021/01/15 09:59:27 INFO mapred.LocalJobRunner: map task executor complete.-01-15 09:59:28,168 Stage-1 map = 100%, reduce = 0%Ended Job = job_local1720005872_0002MapReduce Jobs Launched: Stage-Stage-1: HDFS Read: 940220630 HDFS Write: 564850399 HDFS EC Read: 0 SUCCESSTotal MapReduce CPU Time Spent: 0 msecOK7369 smith clerk 7902 1980-12-17800.00 NULL 20Time taken: 3.794 seconds, Fetched: 1 row(s)hive>

三.合并输入小文件

如果Job输入有很多小文件,造成Map数太多,影响效率

set hive.input.format=org.apache.hadoop.hive.bineHiveInputFormat #执行Map前进行小文件合并

set mapred.max.split.size=256000000; #每个Map最大输入大小

set mapred.min.split.size.per.node=100000000; #一个节点上split的至少的大小

set mapred.min.split.size.per.rack=100000000; #一个交换机下split的至少的大小

hive.input.format系统默认值已经是 org.apache.hadoop.hive.bineHiveInputFormat

我们从参数mapred.max.split.size开始调优

代码:

set mapred.max.split.size=256000000; select count(*) from ods_fact_sale;set mapred.max.split.size=1024000000; select count(*) from ods_fact_sale;

测试记录:

可以看到 增加map的最大数据大小,小文件合并得更多了,性能提升了一倍。

hive> set mapred.max.split.size=256000000; hive> select count(*) from ods_fact_sale;Query ID = root_0108095302_fc928195-30a0-4956-a201-06a81dfeb155Total jobs = 1Launching Job 1 out of 1Number of reduce tasks determined at compile time: 1In order to change the average load for a reducer (in bytes):set hive.exec.reducers.bytes.per.reducer=<number>In order to limit the maximum number of reducers:set hive.exec.reducers.max=<number>In order to set a constant number of reducers:set mapreduce.job.reduces=<number>Starting Job = job_1610015767041_0001, Tracking URL = http://hp1:8088/proxy/application_1610015767041_0001/Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/hadoop job -kill job_1610015767041_0001Hadoop job information for Stage-1: number of mappers: 117; number of reducers: 1-01-08 09:53:13,719 Stage-1 map = 0%, reduce = 0%-01-08 09:53:24,058 Stage-1 map = 2%, reduce = 0%, Cumulative CPU 13.17 sec-01-08 09:53:30,232 Stage-1 map = 3%, reduce = 0%, Cumulative CPU 19.27 sec-01-08 09:53:37,447 Stage-1 map = 4%, reduce = 0%, Cumulative CPU 31.55 sec-01-08 09:53:38,475 Stage-1 map = 5%, reduce = 0%, Cumulative CPU 37.6 sec-01-08 09:53:43,625 Stage-1 map = 6%, reduce = 0%, Cumulative CPU 43.7 sec-01-08 09:53:45,681 Stage-1 map = 7%, reduce = 0%, Cumulative CPU 49.85 sec-01-08 09:53:49,801 Stage-1 map = 8%, reduce = 0%, Cumulative CPU 55.96 sec-01-08 09:53:52,881 Stage-1 map = 9%, reduce = 0%, Cumulative CPU 62.14 sec-01-08 09:53:59,036 Stage-1 map = 10%, reduce = 0%, Cumulative CPU 74.19 sec-01-08 09:54:03,149 Stage-1 map = 11%, reduce = 0%, Cumulative CPU 80.24 sec-01-08 09:54:06,224 Stage-1 map = 12%, reduce = 0%, Cumulative CPU 86.19 sec-01-08 09:54:10,339 Stage-1 map = 13%, reduce = 0%, Cumulative CPU 92.07 sec-01-08 09:54:13,430 Stage-1 map = 14%, reduce = 0%, Cumulative CPU 98.09 sec-01-08 09:54:16,511 Stage-1 map = 15%, reduce = 0%, Cumulative CPU 104.02 sec-01-08 09:54:23,690 Stage-1 map = 16%, reduce = 0%, Cumulative CPU 116.06 sec-01-08 09:54:25,738 Stage-1 map = 17%, reduce = 0%, Cumulative CPU 122.13 sec-01-08 09:54:30,859 Stage-1 map = 18%, reduce = 0%, Cumulative CPU 128.32 sec-01-08 09:54:31,882 Stage-1 map = 19%, reduce = 0%, Cumulative CPU 133.84 sec-01-08 09:54:36,999 Stage-1 map = 20%, reduce = 0%, Cumulative CPU 139.87 sec-01-08 09:54:39,041 Stage-1 map = 21%, reduce = 0%, Cumulative CPU 145.88 sec-01-08 09:54:45,187 Stage-1 map = 22%, reduce = 0%, Cumulative CPU 158.01 sec-01-08 09:54:50,312 Stage-1 map = 23%, reduce = 0%, Cumulative CPU 164.21 sec-01-08 09:54:51,337 Stage-1 map = 24%, reduce = 0%, Cumulative CPU 170.44 sec-01-08 09:54:57,483 Stage-1 map = 25%, reduce = 0%, Cumulative CPU 176.53 sec-01-08 09:54:58,502 Stage-1 map = 26%, reduce = 0%, Cumulative CPU 182.63 sec-01-08 09:55:05,640 Stage-1 map = 27%, reduce = 0%, Cumulative CPU 194.87 sec-01-08 09:55:10,749 Stage-1 map = 28%, reduce = 0%, Cumulative CPU 200.93 sec-01-08 09:55:11,775 Stage-1 map = 29%, reduce = 0%, Cumulative CPU 206.88 sec-01-08 09:55:16,879 Stage-1 map = 30%, reduce = 0%, Cumulative CPU 212.95 sec-01-08 09:55:17,903 Stage-1 map = 31%, reduce = 0%, Cumulative CPU 218.94 sec-01-08 09:55:24,025 Stage-1 map = 32%, reduce = 0%, Cumulative CPU 225.02 sec-01-08 09:55:30,166 Stage-1 map = 33%, reduce = 0%, Cumulative CPU 237.62 sec-01-08 09:55:31,189 Stage-1 map = 34%, reduce = 0%, Cumulative CPU 243.58 sec-01-08 09:55:36,312 Stage-1 map = 35%, reduce = 0%, Cumulative CPU 249.74 sec-01-08 09:55:38,363 Stage-1 map = 36%, reduce = 0%, Cumulative CPU 255.66 sec-01-08 09:55:43,481 Stage-1 map = 37%, reduce = 0%, Cumulative CPU 261.76 sec-01-08 09:55:45,526 Stage-1 map = 38%, reduce = 0%, Cumulative CPU 267.83 sec-01-08 09:55:52,680 Stage-1 map = 39%, reduce = 0%, Cumulative CPU 280.1 sec-01-08 09:55:57,794 Stage-1 map = 40%, reduce = 0%, Cumulative CPU 286.35 sec-01-08 09:55:58,820 Stage-1 map = 41%, reduce = 0%, Cumulative CPU 292.32 sec-01-08 09:56:04,955 Stage-1 map = 42%, reduce = 0%, Cumulative CPU 298.4 sec-01-08 09:56:05,977 Stage-1 map = 43%, reduce = 0%, Cumulative CPU 304.46 sec-01-08 09:56:11,079 Stage-1 map = 44%, reduce = 0%, Cumulative CPU 310.53 sec-01-08 09:56:17,212 Stage-1 map = 45%, reduce = 0%, Cumulative CPU 322.43 sec-01-08 09:56:18,238 Stage-1 map = 46%, reduce = 0%, Cumulative CPU 328.44 sec-01-08 09:56:23,351 Stage-1 map = 47%, reduce = 0%, Cumulative CPU 334.45 sec-01-08 09:56:25,398 Stage-1 map = 48%, reduce = 0%, Cumulative CPU 340.55 sec-01-08 09:56:29,489 Stage-1 map = 49%, reduce = 0%, Cumulative CPU 346.59 sec-01-08 09:56:31,544 Stage-1 map = 50%, reduce = 0%, Cumulative CPU 352.58 sec-01-08 09:56:38,709 Stage-1 map = 51%, reduce = 0%, Cumulative CPU 363.88 sec-01-08 09:56:41,781 Stage-1 map = 52%, reduce = 0%, Cumulative CPU 370.0 sec-01-08 09:56:45,864 Stage-1 map = 53%, reduce = 0%, Cumulative CPU 376.15 sec-01-08 09:56:48,926 Stage-1 map = 54%, reduce = 0%, Cumulative CPU 382.14 sec-01-08 09:56:53,009 Stage-1 map = 55%, reduce = 0%, Cumulative CPU 388.37 sec-01-08 09:56:55,055 Stage-1 map = 56%, reduce = 0%, Cumulative CPU 394.4 sec-01-08 09:57:02,263 Stage-1 map = 57%, reduce = 0%, Cumulative CPU 406.9 sec-01-08 09:57:06,356 Stage-1 map = 58%, reduce = 0%, Cumulative CPU 412.94 sec-01-08 09:57:08,406 Stage-1 map = 59%, reduce = 0%, Cumulative CPU 418.95 sec-01-08 09:57:13,534 Stage-1 map = 61%, reduce = 0%, Cumulative CPU 431.11 sec-01-08 09:57:19,695 Stage-1 map = 62%, reduce = 0%, Cumulative CPU 437.12 sec-01-08 09:57:26,861 Stage-1 map = 64%, reduce = 0%, Cumulative CPU 455.35 sec-01-08 09:57:33,019 Stage-1 map = 65%, reduce = 0%, Cumulative CPU 461.72 sec-01-08 09:57:34,048 Stage-1 map = 66%, reduce = 0%, Cumulative CPU 467.72 sec-01-08 09:57:40,215 Stage-1 map = 68%, reduce = 0%, Cumulative CPU 480.15 sec-01-08 09:57:47,404 Stage-1 map = 69%, reduce = 0%, Cumulative CPU 492.39 sec-01-08 09:57:53,562 Stage-1 map = 70%, reduce = 0%, Cumulative CPU 498.58 sec-01-08 09:57:54,594 Stage-1 map = 71%, reduce = 0%, Cumulative CPU 504.62 sec-01-08 09:58:00,740 Stage-1 map = 73%, reduce = 0%, Cumulative CPU 516.89 sec-01-08 09:58:06,887 Stage-1 map = 74%, reduce = 0%, Cumulative CPU 529.17 sec-01-08 09:58:13,036 Stage-1 map = 75%, reduce = 0%, Cumulative CPU 535.33 sec-01-08 09:58:14,062 Stage-1 map = 76%, reduce = 0%, Cumulative CPU 541.35 sec-01-08 09:58:19,220 Stage-1 map = 77%, reduce = 0%, Cumulative CPU 547.59 sec-01-08 09:58:21,266 Stage-1 map = 78%, reduce = 0%, Cumulative CPU 553.51 sec-01-08 09:58:26,398 Stage-1 map = 79%, reduce = 0%, Cumulative CPU 559.61 sec-01-08 09:58:33,570 Stage-1 map = 80%, reduce = 0%, Cumulative CPU 571.67 sec-01-08 09:58:34,597 Stage-1 map = 81%, reduce = 0%, Cumulative CPU 577.74 sec-01-08 09:58:39,711 Stage-1 map = 82%, reduce = 0%, Cumulative CPU 583.74 sec-01-08 09:58:45,860 Stage-1 map = 83%, reduce = 0%, Cumulative CPU 589.8 sec-01-08 09:58:49,965 Stage-1 map = 83%, reduce = 28%, Cumulative CPU 590.58 sec-01-08 09:58:52,015 Stage-1 map = 84%, reduce = 28%, Cumulative CPU 596.77 sec-01-08 09:58:59,208 Stage-1 map = 85%, reduce = 28%, Cumulative CPU 603.02 sec-01-08 09:59:12,545 Stage-1 map = 86%, reduce = 28%, Cumulative CPU 615.23 sec-01-08 09:59:13,573 Stage-1 map = 86%, reduce = 29%, Cumulative CPU 615.3 sec-01-08 09:59:18,696 Stage-1 map = 87%, reduce = 29%, Cumulative CPU 621.24 sec-01-08 09:59:25,866 Stage-1 map = 88%, reduce = 29%, Cumulative CPU 627.3 sec-01-08 09:59:30,975 Stage-1 map = 89%, reduce = 29%, Cumulative CPU 632.86 sec-01-08 09:59:37,112 Stage-1 map = 90%, reduce = 29%, Cumulative CPU 638.91 sec-01-08 09:59:38,137 Stage-1 map = 90%, reduce = 30%, Cumulative CPU 638.96 sec-01-08 09:59:44,277 Stage-1 map = 91%, reduce = 30%, Cumulative CPU 644.97 sec-01-08 09:59:58,593 Stage-1 map = 92%, reduce = 30%, Cumulative CPU 657.22 sec-01-08 10:00:02,681 Stage-1 map = 92%, reduce = 31%, Cumulative CPU 657.28 sec-01-08 10:00:05,748 Stage-1 map = 93%, reduce = 31%, Cumulative CPU 663.32 sec-01-08 10:00:11,877 Stage-1 map = 94%, reduce = 31%, Cumulative CPU 669.46 sec-01-08 10:00:18,009 Stage-1 map = 95%, reduce = 31%, Cumulative CPU 675.49 sec-01-08 10:00:20,055 Stage-1 map = 95%, reduce = 32%, Cumulative CPU 675.55 sec-01-08 10:00:24,152 Stage-1 map = 96%, reduce = 32%, Cumulative CPU 681.46 sec-01-08 10:00:30,288 Stage-1 map = 97%, reduce = 32%, Cumulative CPU 687.97 sec-01-08 10:00:43,594 Stage-1 map = 98%, reduce = 32%, Cumulative CPU 701.09 sec-01-08 10:00:44,619 Stage-1 map = 98%, reduce = 33%, Cumulative CPU 701.15 sec-01-08 10:00:50,744 Stage-1 map = 99%, reduce = 33%, Cumulative CPU 707.46 sec-01-08 10:00:57,906 Stage-1 map = 100%, reduce = 33%, Cumulative CPU 714.02 sec-01-08 10:00:58,932 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 716.38 secMapReduce Total cumulative CPU time: 11 minutes 56 seconds 380 msecEnded Job = job_1610015767041_0001MapReduce Jobs Launched: Stage-Stage-1: Map: 117 Reduce: 1 Cumulative CPU: 716.38 sec HDFS Read: 31436897886 HDFS Write: 109 HDFS EC Read: 0 SUCCESSTotal MapReduce CPU Time Spent: 11 minutes 56 seconds 380 msecOK767830000Time taken: 478.119 seconds, Fetched: 1 row(s)hive> > set mapred.max.split.size=1024000000; hive> select count(*) from ods_fact_sale;Query ID = root_0108100115_bf06e3b5-6ff1-4c29-ab4a-25c8b1559791Total jobs = 1Launching Job 1 out of 1Number of reduce tasks determined at compile time: 1In order to change the average load for a reducer (in bytes):set hive.exec.reducers.bytes.per.reducer=<number>In order to limit the maximum number of reducers:set hive.exec.reducers.max=<number>In order to set a constant number of reducers:set mapreduce.job.reduces=<number>Starting Job = job_1610015767041_0002, Tracking URL = http://hp1:8088/proxy/application_1610015767041_0002/Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/hadoop job -kill job_1610015767041_0002Hadoop job information for Stage-1: number of mappers: 30; number of reducers: 1-01-08 10:01:23,394 Stage-1 map = 0%, reduce = 0%-01-08 10:01:38,815 Stage-1 map = 3%, reduce = 0%, Cumulative CPU 13.38 sec-01-08 10:01:39,837 Stage-1 map = 7%, reduce = 0%, Cumulative CPU 27.8 sec-01-08 10:01:52,127 Stage-1 map = 10%, reduce = 0%, Cumulative CPU 40.53 sec-01-08 10:01:53,151 Stage-1 map = 13%, reduce = 0%, Cumulative CPU 52.79 sec-01-08 10:02:05,431 Stage-1 map = 17%, reduce = 0%, Cumulative CPU 65.41 sec-01-08 10:02:06,458 Stage-1 map = 20%, reduce = 0%, Cumulative CPU 78.34 sec-01-08 10:02:18,762 Stage-1 map = 23%, reduce = 0%, Cumulative CPU 91.39 sec-01-08 10:02:19,787 Stage-1 map = 27%, reduce = 0%, Cumulative CPU 104.75 sec-01-08 10:02:31,062 Stage-1 map = 30%, reduce = 0%, Cumulative CPU 117.2 sec-01-08 10:02:32,093 Stage-1 map = 33%, reduce = 0%, Cumulative CPU 130.27 sec-01-08 10:02:44,385 Stage-1 map = 37%, reduce = 0%, Cumulative CPU 142.72 sec-01-08 10:02:45,410 Stage-1 map = 40%, reduce = 0%, Cumulative CPU 155.18 sec-01-08 10:02:57,675 Stage-1 map = 43%, reduce = 0%, Cumulative CPU 168.91 sec-01-08 10:02:58,701 Stage-1 map = 47%, reduce = 0%, Cumulative CPU 181.56 sec-01-08 10:03:03,809 Stage-1 map = 50%, reduce = 0%, Cumulative CPU 186.53 sec-01-08 10:03:10,962 Stage-1 map = 53%, reduce = 0%, Cumulative CPU 198.51 sec-01-08 10:03:18,128 Stage-1 map = 57%, reduce = 0%, Cumulative CPU 211.03 sec-01-08 10:03:24,256 Stage-1 map = 60%, reduce = 0%, Cumulative CPU 224.14 sec-01-08 10:03:33,442 Stage-1 map = 63%, reduce = 0%, Cumulative CPU 240.3 sec-01-08 10:03:37,521 Stage-1 map = 67%, reduce = 0%, Cumulative CPU 252.98 sec-01-08 10:03:47,734 Stage-1 map = 70%, reduce = 0%, Cumulative CPU 266.97 sec-01-08 10:03:50,800 Stage-1 map = 73%, reduce = 0%, Cumulative CPU 279.74 sec-01-08 10:04:01,021 Stage-1 map = 77%, reduce = 0%, Cumulative CPU 292.45 sec-01-08 10:04:04,097 Stage-1 map = 80%, reduce = 0%, Cumulative CPU 305.39 sec-01-08 10:04:14,330 Stage-1 map = 83%, reduce = 0%, Cumulative CPU 318.52 sec-01-08 10:04:17,400 Stage-1 map = 87%, reduce = 0%, Cumulative CPU 331.82 sec-01-08 10:04:28,637 Stage-1 map = 87%, reduce = 29%, Cumulative CPU 332.58 sec-01-08 10:04:29,657 Stage-1 map = 90%, reduce = 29%, Cumulative CPU 345.6 sec-01-08 10:04:34,759 Stage-1 map = 90%, reduce = 30%, Cumulative CPU 345.75 sec-01-08 10:04:42,925 Stage-1 map = 93%, reduce = 30%, Cumulative CPU 358.71 sec-01-08 10:04:47,016 Stage-1 map = 93%, reduce = 31%, Cumulative CPU 358.79 sec-01-08 10:04:55,206 Stage-1 map = 97%, reduce = 31%, Cumulative CPU 371.67 sec-01-08 10:04:59,301 Stage-1 map = 97%, reduce = 32%, Cumulative CPU 371.73 sec-01-08 10:05:09,527 Stage-1 map = 100%, reduce = 32%, Cumulative CPU 385.23 sec-01-08 10:05:10,559 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 387.27 secMapReduce Total cumulative CPU time: 6 minutes 27 seconds 270 msecEnded Job = job_1610015767041_0002MapReduce Jobs Launched: Stage-Stage-1: Map: 30 Reduce: 1 Cumulative CPU: 387.27 sec HDFS Read: 31436522876 HDFS Write: 109 HDFS EC Read: 0 SUCCESSTotal MapReduce CPU Time Spent: 6 minutes 27 seconds 270 msecOK767830000Time taken: 237.197 seconds, Fetched: 1 row(s)hive> >

四.合并输出小文件

正常的map-reduce job后,是否启动merge job来合并reduce端输出的结果,建议开启

hive.merge.smallfiles.avgsize(默认为16MB)

如果不是partitioned table的话,输出table文件的平均大小小于这个值,启动merge job,如果是partitioned table,则分别计算每个partition下文件平均大小,只merge平均大小小于这个值的partition。这个值只有当hive.merge.mapfiles或hive.merge.mapredfiles设定为true时,才有效

hive.exec.reducers.bytes.per.reducer(默认为1G,我的测试环境默认64M)

如果用户不主动设置mapred.reduce.tasks数,则会根据input directory计算出所有读入文件的input summary size,然后除以这个值算出reduce number

reducers = (int) ((totalInputFileSize + bytesPerReducer - 1) / bytesPerReducer);reducers = Math.max(1, reducers);reducers = Math.min(maxReducers, reducers);

hive.merge.size.per.task(默认是256MB)

merge job后每个文件的目标大小(targetSize),用之前job输出文件的total size除以这个值,就可以决定merge job的reduce数目。merge job的map端相当于identity map,然后shuffle到reduce,每个reduce dump一个文件,通过这种方式控制文件的数量和大小

mapred.max.split.size(默认256MB)

mapred.min.split.size.per.node(默认1 byte)

mapred.min.split.size.per.rack(默认1 byte)

这三个参数CombineFileInputFormat中会使用,Hive默认的InputFormat是CombineHiveInputFormat,里面所有的调用(包括最重要的getSplits和getRecordReader)都会转换成CombineFileInputFormat的调用,所以可以看成是它的一个包装。CombineFileInputFormat 可以将许多小文件合并成一个map的输入,如果文件很大,也可以对大文件进行切分,分成多个map的输入。一个CombineFileSplit对应一个map的输入,包含一组path(hdfs路径list),startoffset, lengths, locations(文件所在hostname list)mapred.max.split.size是一个split 最大的大小,mapred.min.split.size.per.node是一个节点上(datanode)split至少的大小,mapred.min.split.size.per.rack是同一个交换机(rack locality)下split至少的大小通过这三个数的调节,组成了一串CombineFileSplit用户可以通过增大mapred.max.split.size的值来减少Map Task数量

代码:

set hive.exec.reducers.bytes.per.reducer = 64000000;CREATE TABLE merge_test1( prod_namestring, max_sale_nums int,min_sale_nums int)STORED AS textfile ;insert into merge_test1select prod_name,max(sale_nums),min(sale_nums)from ods_fact_sale group by prod_name;set hive.exec.reducers.bytes.per.reducer = 1024000000;CREATE TABLE merge_test2( prod_namestring, max_sale_nums int,min_sale_nums int)STORED AS textfile ;insert into merge_test2select prod_name,max(sale_nums),min(sale_nums)from ods_fact_sale group by prod_name;-- 这个代码不仅reduce多,map数也多,再增大map的参数值看看效果set hive.exec.reducers.bytes.per.reducer = 1024000000;set mapred.max.split.size=1024000000; CREATE TABLE merge_test3( prod_namestring, max_sale_nums int,min_sale_nums int)STORED AS textfile ;insert into merge_test3select prod_name,max(sale_nums),min(sale_nums)from ods_fact_sale group by prod_name;

测试记录:

可以看到,reducers由原来的491减少到了31,reduce的时间也大大减少,降低了执行的效率

hive> > > set hive.exec.reducers.bytes.per.reducer = 64000000;hive> CREATE TABLE merge_test1( > prod_namestring, > max_sale_nums int,> min_sale_nums int> )> STORED AS textfile ;OKTime taken: 0.44 secondshive> insert into merge_test1> select prod_name,max(sale_nums),min(sale_nums)> from ods_fact_sale > group by prod_name;Query ID = root_0114190128_0ad3fd5e-a9be-48e6-a473-f2ceea28753bTotal jobs = 1Launching Job 1 out of 1Number of reduce tasks not specified. Estimated from input data size: 491In order to change the average load for a reducer (in bytes):set hive.exec.reducers.bytes.per.reducer=<number>In order to limit the maximum number of reducers:set hive.exec.reducers.max=<number>In order to set a constant number of reducers:set mapreduce.job.reduces=<number>Starting Job = job_1610015767041_0032, Tracking URL = http://hp1:8088/proxy/application_1610015767041_0032/Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/hadoop job -kill job_1610015767041_0032Hadoop job information for Stage-1: number of mappers: 117; number of reducers: 491-01-14 19:01:40,009 Stage-1 map = 0%, reduce = 0%-01-14 19:01:51,572 Stage-1 map = 2%, reduce = 0%, Cumulative CPU 20.26 sec-01-14 19:02:00,900 Stage-1 map = 3%, reduce = 0%, Cumulative CPU 29.61 sec-01-14 19:02:10,192 Stage-1 map = 4%, reduce = 0%, Cumulative CPU 48.25 sec-01-14 19:02:11,215 Stage-1 map = 5%, reduce = 0%, Cumulative CPU 57.84 sec-01-14 19:02:19,444 Stage-1 map = 6%, reduce = 0%, Cumulative CPU 67.39 sec-01-14 19:02:20,474 Stage-1 map = 7%, reduce = 0%, Cumulative CPU 76.67 sec-01-14 19:02:28,691 Stage-1 map = 8%, reduce = 0%, Cumulative CPU 86.07 sec-01-14 19:02:29,715 Stage-1 map = 9%, reduce = 0%, Cumulative CPU 95.53 sec-01-14 19:02:38,952 Stage-1 map = 10%, reduce = 0%, Cumulative CPU 113.56 sec-01-14 19:02:47,187 Stage-1 map = 11%, reduce = 0%, Cumulative CPU 122.89 sec-01-14 19:02:48,213 Stage-1 map = 12%, reduce = 0%, Cumulative CPU 131.94 sec-01-14 19:02:55,421 Stage-1 map = 13%, reduce = 0%, Cumulative CPU 140.98 sec-01-14 19:02:56,448 Stage-1 map = 14%, reduce = 0%, Cumulative CPU 149.91 sec-01-14 19:03:04,672 Stage-1 map = 15%, reduce = 0%, Cumulative CPU 158.94 sec-01-14 19:03:13,917 Stage-1 map = 16%, reduce = 0%, Cumulative CPU 177.18 sec-01-14 19:03:14,943 Stage-1 map = 17%, reduce = 0%, Cumulative CPU 186.2 sec-01-14 19:03:23,142 Stage-1 map = 18%, reduce = 0%, Cumulative CPU 195.49 sec-01-14 19:03:24,169 Stage-1 map = 19%, reduce = 0%, Cumulative CPU 204.71 sec-01-14 19:03:32,392 Stage-1 map = 21%, reduce = 0%, Cumulative CPU 222.8 sec-01-14 19:03:41,632 Stage-1 map = 22%, reduce = 0%, Cumulative CPU 240.59 sec-01-14 19:03:50,882 Stage-1 map = 24%, reduce = 0%, Cumulative CPU 259.31 sec-01-14 19:04:00,123 Stage-1 map = 26%, reduce = 0%, Cumulative CPU 277.48 sec-01-14 19:04:09,362 Stage-1 map = 27%, reduce = 0%, Cumulative CPU 295.12 sec-01-14 19:04:17,569 Stage-1 map = 28%, reduce = 0%, Cumulative CPU 304.25 sec-01-14 19:04:18,598 Stage-1 map = 29%, reduce = 0%, Cumulative CPU 313.34 sec-01-14 19:04:26,809 Stage-1 map = 31%, reduce = 0%, Cumulative CPU 331.84 sec-01-14 19:04:36,035 Stage-1 map = 32%, reduce = 0%, Cumulative CPU 350.14 sec-01-14 19:04:45,285 Stage-1 map = 34%, reduce = 0%, Cumulative CPU 368.4 sec-01-14 19:04:54,527 Stage-1 map = 36%, reduce = 0%, Cumulative CPU 386.85 sec-01-14 19:05:02,755 Stage-1 map = 38%, reduce = 0%, Cumulative CPU 404.95 sec-01-14 19:05:11,991 Stage-1 map = 39%, reduce = 0%, Cumulative CPU 423.57 sec-01-14 19:05:21,196 Stage-1 map = 41%, reduce = 0%, Cumulative CPU 441.83 sec-01-14 19:05:30,419 Stage-1 map = 43%, reduce = 0%, Cumulative CPU 460.06 sec-01-14 19:05:39,645 Stage-1 map = 44%, reduce = 0%, Cumulative CPU 478.47 sec-01-14 19:05:47,844 Stage-1 map = 46%, reduce = 0%, Cumulative CPU 496.52 sec-01-14 19:05:57,057 Stage-1 map = 48%, reduce = 0%, Cumulative CPU 514.75 sec-01-14 19:06:06,287 Stage-1 map = 50%, reduce = 0%, Cumulative CPU 533.26 sec-01-14 19:06:16,555 Stage-1 map = 51%, reduce = 0%, Cumulative CPU 550.72 sec-01-14 19:06:22,700 Stage-1 map = 52%, reduce = 0%, Cumulative CPU 560.75 sec-01-14 19:06:25,764 Stage-1 map = 53%, reduce = 0%, Cumulative CPU 570.45 sec-01-14 19:06:32,931 Stage-1 map = 54%, reduce = 0%, Cumulative CPU 579.77 sec-01-14 19:06:34,980 Stage-1 map = 55%, reduce = 0%, Cumulative CPU 589.18 sec-01-14 19:06:42,160 Stage-1 map = 56%, reduce = 0%, Cumulative CPU 598.64 sec-01-14 19:06:50,355 Stage-1 map = 57%, reduce = 0%, Cumulative CPU 617.02 sec-01-14 19:06:52,395 Stage-1 map = 58%, reduce = 0%, Cumulative CPU 626.5 sec-01-14 19:06:59,573 Stage-1 map = 59%, reduce = 0%, Cumulative CPU 635.5 sec-01-14 19:07:02,654 Stage-1 map = 60%, reduce = 0%, Cumulative CPU 644.2 sec-01-14 19:07:08,811 Stage-1 map = 61%, reduce = 0%, Cumulative CPU 653.34 sec-01-14 19:07:11,884 Stage-1 map = 62%, reduce = 0%, Cumulative CPU 662.47 sec-01-14 19:07:21,104 Stage-1 map = 63%, reduce = 0%, Cumulative CPU 680.52 sec-01-14 19:07:26,230 Stage-1 map = 64%, reduce = 0%, Cumulative CPU 689.71 sec-01-14 19:07:29,303 Stage-1 map = 65%, reduce = 0%, Cumulative CPU 698.82 sec-01-14 19:07:35,443 Stage-1 map = 66%, reduce = 0%, Cumulative CPU 708.11 sec-01-14 19:07:38,520 Stage-1 map = 67%, reduce = 0%, Cumulative CPU 717.54 sec-01-14 19:07:44,681 Stage-1 map = 68%, reduce = 0%, Cumulative CPU 726.63 sec-01-14 19:07:53,910 Stage-1 map = 69%, reduce = 0%, Cumulative CPU 744.91 sec-01-14 19:07:56,984 Stage-1 map = 70%, reduce = 0%, Cumulative CPU 753.6 sec-01-14 19:08:02,103 Stage-1 map = 71%, reduce = 0%, Cumulative CPU 762.52 sec-01-14 19:08:05,183 Stage-1 map = 72%, reduce = 0%, Cumulative CPU 771.53 sec-01-14 19:08:11,343 Stage-1 map = 73%, reduce = 0%, Cumulative CPU 780.5 sec-01-14 19:08:14,411 Stage-1 map = 74%, reduce = 0%, Cumulative CPU 789.43 sec-01-14 19:08:23,626 Stage-1 map = 75%, reduce = 0%, Cumulative CPU 807.77 sec-01-14 19:08:29,773 Stage-1 map = 76%, reduce = 0%, Cumulative CPU 816.92 sec-01-14 19:08:32,855 Stage-1 map = 77%, reduce = 0%, Cumulative CPU 826.07 sec-01-14 19:08:39,012 Stage-1 map = 78%, reduce = 0%, Cumulative CPU 835.04 sec-01-14 19:08:42,085 Stage-1 map = 79%, reduce = 0%, Cumulative CPU 844.31 sec-01-14 19:08:50,282 Stage-1 map = 80%, reduce = 0%, Cumulative CPU 862.28 sec-01-14 19:08:56,435 Stage-1 map = 81%, reduce = 0%, Cumulative CPU 871.3 sec-01-14 19:08:59,517 Stage-1 map = 82%, reduce = 0%, Cumulative CPU 880.27 sec-01-14 19:09:08,745 Stage-1 map = 83%, reduce = 0%, Cumulative CPU 889.25 sec-01-14 19:09:17,980 Stage-1 map = 84%, reduce = 0%, Cumulative CPU 899.29 sec-01-14 19:09:27,209 Stage-1 map = 85%, reduce = 0%, Cumulative CPU 908.46 sec-01-14 19:09:44,642 Stage-1 map = 86%, reduce = 0%, Cumulative CPU 926.91 sec-01-14 19:09:53,881 Stage-1 map = 87%, reduce = 0%, Cumulative CPU 936.23 sec-01-14 19:10:03,098 Stage-1 map = 88%, reduce = 0%, Cumulative CPU 945.51 sec-01-14 19:10:12,330 Stage-1 map = 89%, reduce = 0%, Cumulative CPU 954.95 sec-01-14 19:10:21,552 Stage-1 map = 90%, reduce = 0%, Cumulative CPU 964.34 sec-01-14 19:10:29,749 Stage-1 map = 91%, reduce = 0%, Cumulative CPU 973.66 sec-01-14 19:10:48,196 Stage-1 map = 92%, reduce = 0%, Cumulative CPU 992.05 sec-01-14 19:10:57,412 Stage-1 map = 93%, reduce = 0%, Cumulative CPU 1001.08 sec-01-14 19:11:06,644 Stage-1 map = 94%, reduce = 0%, Cumulative CPU 1010.73 sec-01-14 19:11:14,839 Stage-1 map = 95%, reduce = 0%, Cumulative CPU 1019.97 sec-01-14 19:11:24,067 Stage-1 map = 96%, reduce = 0%, Cumulative CPU 1029.3 sec-01-14 19:11:33,310 Stage-1 map = 97%, reduce = 0%, Cumulative CPU 1038.48 sec-01-14 19:11:50,765 Stage-1 map = 98%, reduce = 0%, Cumulative CPU 1057.35 sec-01-14 19:12:00,020 Stage-1 map = 99%, reduce = 0%, Cumulative CPU 1066.54 sec-01-14 19:12:09,255 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1075.51 sec-01-14 19:12:15,439 Stage-1 map = 100%, reduce = 1%, Cumulative CPU 1082.5 sec-01-14 19:12:25,678 Stage-1 map = 100%, reduce = 2%, Cumulative CPU 1095.86 sec-01-14 19:12:34,879 Stage-1 map = 100%, reduce = 3%, Cumulative CPU 1109.58 sec-01-14 19:12:45,106 Stage-1 map = 100%, reduce = 4%, Cumulative CPU 1122.74 sec-01-14 19:12:55,334 Stage-1 map = 100%, reduce = 5%, Cumulative CPU 1136.11 sec-01-14 19:13:05,582 Stage-1 map = 100%, reduce = 6%, Cumulative CPU 1149.12 sec-01-14 19:13:13,776 Stage-1 map = 100%, reduce = 7%, Cumulative CPU 1159.81 sec-01-14 19:13:23,005 Stage-1 map = 100%, reduce = 8%, Cumulative CPU 1172.96 sec-01-14 19:13:33,248 Stage-1 map = 100%, reduce = 9%, Cumulative CPU 1186.41 sec-01-14 19:13:43,498 Stage-1 map = 100%, reduce = 10%, Cumulative CPU 1199.92 sec-01-14 19:13:53,741 Stage-1 map = 100%, reduce = 11%, Cumulative CPU 1213.52 sec-01-14 19:14:02,954 Stage-1 map = 100%, reduce = 12%, Cumulative CPU 1226.76 sec-01-14 19:14:13,196 Stage-1 map = 100%, reduce = 13%, Cumulative CPU 1239.9 sec-01-14 19:14:23,428 Stage-1 map = 100%, reduce = 14%, Cumulative CPU 1253.19 sec-01-14 19:14:33,671 Stage-1 map = 100%, reduce = 15%, Cumulative CPU 1266.5 sec-01-14 19:14:43,901 Stage-1 map = 100%, reduce = 16%, Cumulative CPU 1279.88 sec-01-14 19:14:53,123 Stage-1 map = 100%, reduce = 17%, Cumulative CPU 1293.37 sec-01-14 19:15:01,323 Stage-1 map = 100%, reduce = 18%, Cumulative CPU 1303.97 sec-01-14 19:15:11,564 Stage-1 map = 100%, reduce = 19%, Cumulative CPU 1316.96 sec-01-14 19:15:21,782 Stage-1 map = 100%, reduce = 20%, Cumulative CPU 1330.34 sec-01-14 19:15:32,004 Stage-1 map = 100%, reduce = 21%, Cumulative CPU 1343.36 sec-01-14 19:15:42,233 Stage-1 map = 100%, reduce = 22%, Cumulative CPU 1357.14 sec-01-14 19:15:51,448 Stage-1 map = 100%, reduce = 23%, Cumulative CPU 1370.58 sec-01-14 19:16:01,687 Stage-1 map = 100%, reduce = 24%, Cumulative CPU 1383.78 sec-01-14 19:16:11,950 Stage-1 map = 100%, reduce = 25%, Cumulative CPU 1397.07 sec-01-14 19:16:22,194 Stage-1 map = 100%, reduce = 26%, Cumulative CPU 1410.58 sec-01-14 19:16:31,415 Stage-1 map = 100%, reduce = 27%, Cumulative CPU 1423.93 sec-01-14 19:16:41,656 Stage-1 map = 100%, reduce = 28%, Cumulative CPU 1437.45 sec-01-14 19:16:49,855 Stage-1 map = 100%, reduce = 29%, Cumulative CPU 1448.27 sec-01-14 19:17:00,092 Stage-1 map = 100%, reduce = 30%, Cumulative CPU 1461.38 sec-01-14 19:17:10,324 Stage-1 map = 100%, reduce = 31%, Cumulative CPU 1475.12 sec-01-14 19:17:19,521 Stage-1 map = 100%, reduce = 32%, Cumulative CPU 1488.25 sec-01-14 19:17:29,759 Stage-1 map = 100%, reduce = 33%, Cumulative CPU 1501.32 sec-01-14 19:17:40,008 Stage-1 map = 100%, reduce = 34%, Cumulative CPU 1514.62 sec-01-14 19:17:50,259 Stage-1 map = 100%, reduce = 35%, Cumulative CPU 1528.17 sec-01-14 19:17:59,490 Stage-1 map = 100%, reduce = 36%, Cumulative CPU 1541.34 sec-01-14 19:18:09,743 Stage-1 map = 100%, reduce = 37%, Cumulative CPU 1554.44 sec-01-14 19:18:19,982 Stage-1 map = 100%, reduce = 38%, Cumulative CPU 1568.05 sec-01-14 19:18:30,207 Stage-1 map = 100%, reduce = 39%, Cumulative CPU 1581.4 sec-01-14 19:18:38,393 Stage-1 map = 100%, reduce = 40%, Cumulative CPU 1592.03 sec-01-14 19:18:47,619 Stage-1 map = 100%, reduce = 41%, Cumulative CPU 1605.4 sec-01-14 19:18:57,872 Stage-1 map = 100%, reduce = 42%, Cumulative CPU 1618.69 sec-01-14 19:19:08,116 Stage-1 map = 100%, reduce = 43%, Cumulative CPU 1632.45 sec-01-14 19:19:18,345 Stage-1 map = 100%, reduce = 44%, Cumulative CPU 1645.95 sec-01-14 19:19:28,590 Stage-1 map = 100%, reduce = 45%, Cumulative CPU 1659.07 sec-01-14 19:19:37,817 Stage-1 map = 100%, reduce = 46%, Cumulative CPU 1672.16 sec-01-14 19:19:48,050 Stage-1 map = 100%, reduce = 47%, Cumulative CPU 1685.57 sec-01-14 19:19:58,291 Stage-1 map = 100%, reduce = 48%, Cumulative CPU 1698.8 sec-01-14 19:20:08,544 Stage-1 map = 100%, reduce = 49%, Cumulative CPU 1711.89 sec-01-14 19:20:18,773 Stage-1 map = 100%, reduce = 50%, Cumulative CPU 1725.25 sec-01-14 19:20:26,955 Stage-1 map = 100%, reduce = 51%, Cumulative CPU 1736.15 sec-01-14 19:20:36,155 Stage-1 map = 100%, reduce = 52%, Cumulative CPU 1749.24 sec-01-14 19:20:46,404 Stage-1 map = 100%, reduce = 53%, Cumulative CPU 1762.65 sec-01-14 19:20:56,657 Stage-1 map = 100%, reduce = 54%, Cumulative CPU 1775.99 sec-01-14 19:21:06,879 Stage-1 map = 100%, reduce = 55%, Cumulative CPU 1789.3 sec-01-14 19:21:16,104 Stage-1 map = 100%, reduce = 56%, Cumulative CPU 1802.41 sec-01-14 19:21:26,332 Stage-1 map = 100%, reduce = 57%, Cumulative CPU 1815.6 sec-01-14 19:21:36,583 Stage-1 map = 100%, reduce = 58%, Cumulative CPU 1828.81 sec-01-14 19:21:46,819 Stage-1 map = 100%, reduce = 59%, Cumulative CPU 1842.02 sec-01-14 19:21:57,058 Stage-1 map = 100%, reduce = 60%, Cumulative CPU 1855.19 sec-01-14 19:22:06,269 Stage-1 map = 100%, reduce = 61%, Cumulative CPU 1868.47 sec-01-14 19:22:14,467 Stage-1 map = 100%, reduce = 62%, Cumulative CPU 1879.02 sec-01-14 19:22:24,703 Stage-1 map = 100%, reduce = 63%, Cumulative CPU 1892.26 sec-01-14 19:22:34,925 Stage-1 map = 100%, reduce = 64%, Cumulative CPU 1905.62 sec-01-14 19:22:45,166 Stage-1 map = 100%, reduce = 65%, Cumulative CPU 1919.13 sec-01-14 19:22:54,395 Stage-1 map = 100%, reduce = 66%, Cumulative CPU 1932.36 sec-01-14 19:23:04,643 Stage-1 map = 100%, reduce = 67%, Cumulative CPU 1945.42 sec-01-14 19:23:14,872 Stage-1 map = 100%, reduce = 68%, Cumulative CPU 1958.65 sec-01-14 19:23:25,109 Stage-1 map = 100%, reduce = 69%, Cumulative CPU 1971.72 sec-01-14 19:23:34,345 Stage-1 map = 100%, reduce = 70%, Cumulative CPU 1985.23 sec-01-14 19:23:45,620 Stage-1 map = 100%, reduce = 71%, Cumulative CPU 1998.72 sec-01-14 19:23:54,836 Stage-1 map = 100%, reduce = 72%, Cumulative CPU .85 sec-01-14 19:24:03,048 Stage-1 map = 100%, reduce = 73%, Cumulative CPU .39 sec-01-14 19:24:15,337 Stage-1 map = 100%, reduce = 74%, Cumulative CPU 2038.56 sec-01-14 19:24:22,497 Stage-1 map = 100%, reduce = 75%, Cumulative CPU 2049.15 sec-01-14 19:24:34,747 Stage-1 map = 100%, reduce = 76%, Cumulative CPU 2065.24 sec-01-14 19:24:42,947 Stage-1 map = 100%, reduce = 77%, Cumulative CPU 2076.07 sec-01-14 19:24:55,234 Stage-1 map = 100%, reduce = 78%, Cumulative CPU 2092.16 sec-01-14 19:25:03,437 Stage-1 map = 100%, reduce = 79%, Cumulative CPU 2102.85 sec-01-14 19:25:14,690 Stage-1 map = 100%, reduce = 80%, Cumulative CPU 2119.54 sec-01-14 19:25:22,873 Stage-1 map = 100%, reduce = 81%, Cumulative CPU 2130.15 sec-01-14 19:25:35,166 Stage-1 map = 100%, reduce = 82%, Cumulative CPU 2146.6 sec-01-14 19:25:43,363 Stage-1 map = 100%, reduce = 83%, Cumulative CPU 2157.26 sec-01-14 19:25:51,543 Stage-1 map = 100%, reduce = 84%, Cumulative CPU 2168.07 sec-01-14 19:26:02,807 Stage-1 map = 100%, reduce = 85%, Cumulative CPU 2184.15 sec-01-14 19:26:10,993 Stage-1 map = 100%, reduce = 86%, Cumulative CPU 2195.08 sec-01-14 19:26:23,266 Stage-1 map = 100%, reduce = 87%, Cumulative CPU 2211.35 sec-01-14 19:26:32,495 Stage-1 map = 100%, reduce = 88%, Cumulative CPU 2222.08 sec-01-14 19:26:42,736 Stage-1 map = 100%, reduce = 89%, Cumulative CPU 2235.54 sec-01-14 19:26:51,971 Stage-1 map = 100%, reduce = 90%, Cumulative CPU 2248.66 sec-01-14 19:27:03,245 Stage-1 map = 100%, reduce = 91%, Cumulative CPU 2261.92 sec-01-14 19:27:12,488 Stage-1 map = 100%, reduce = 92%, Cumulative CPU 2275.14 sec-01-14 19:27:22,730 Stage-1 map = 100%, reduce = 93%, Cumulative CPU 2288.47 sec-01-14 19:27:31,945 Stage-1 map = 100%, reduce = 94%, Cumulative CPU 2301.79 sec-01-14 19:27:40,164 Stage-1 map = 100%, reduce = 95%, Cumulative CPU 2312.43 sec-01-14 19:27:51,448 Stage-1 map = 100%, reduce = 96%, Cumulative CPU 2325.67 sec-01-14 19:28:00,688 Stage-1 map = 100%, reduce = 97%, Cumulative CPU 2338.72 sec-01-14 19:28:10,933 Stage-1 map = 100%, reduce = 98%, Cumulative CPU 2352.3 sec-01-14 19:28:20,151 Stage-1 map = 100%, reduce = 99%, Cumulative CPU 2365.62 sec-01-14 19:28:36,544 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 2384.26 secMapReduce Total cumulative CPU time: 39 minutes 44 seconds 260 msecEnded Job = job_1610015767041_0032Loading data to table default.merge_test1MapReduce Jobs Launched: Stage-Stage-1: Map: 117 Reduce: 491 Cumulative CPU: 2384.26 sec HDFS Read: 31439317219 HDFS Write: 22465 HDFS EC Read: 0 SUCCESSTotal MapReduce CPU Time Spent: 39 minutes 44 seconds 260 msecOKTime taken: 1631.392 secondshive> > set hive.exec.reducers.bytes.per.reducer = 1024000000;hive> CREATE TABLE merge_test2( > prod_namestring, > max_sale_nums int,> min_sale_nums int> )> STORED AS textfile ;OKTime taken: 0.078 secondshive> insert into merge_test2> select prod_name,max(sale_nums),min(sale_nums)> from ods_fact_sale > group by prod_name;Query ID = root_0114192908_41f36835-549c-46bd-a897-c6a08aafab8fTotal jobs = 1Launching Job 1 out of 1Number of reduce tasks not specified. Estimated from input data size: 31In order to change the average load for a reducer (in bytes):set hive.exec.reducers.bytes.per.reducer=<number>In order to limit the maximum number of reducers:set hive.exec.reducers.max=<number>In order to set a constant number of reducers:set mapreduce.job.reduces=<number>Starting Job = job_1610015767041_0033, Tracking URL = http://hp1:8088/proxy/application_1610015767041_0033/Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/hadoop job -kill job_1610015767041_0033Hadoop job information for Stage-1: number of mappers: 117; number of reducers: 31-01-14 19:29:16,912 Stage-1 map = 0%, reduce = 0%-01-14 19:29:29,259 Stage-1 map = 2%, reduce = 0%, Cumulative CPU 19.35 sec-01-14 19:29:38,504 Stage-1 map = 3%, reduce = 0%, Cumulative CPU 37.67 sec-01-14 19:29:46,710 Stage-1 map = 5%, reduce = 0%, Cumulative CPU 55.21 sec-01-14 19:29:55,943 Stage-1 map = 7%, reduce = 0%, Cumulative CPU 73.27 sec-01-14 19:30:05,169 Stage-1 map = 9%, reduce = 0%, Cumulative CPU 91.12 sec-01-14 19:30:14,390 Stage-1 map = 10%, reduce = 0%, Cumulative CPU 109.31 sec-01-14 19:30:22,567 Stage-1 map = 12%, reduce = 0%, Cumulative CPU 127.55 sec-01-14 19:30:31,776 Stage-1 map = 14%, reduce = 0%, Cumulative CPU 145.45 sec-01-14 19:30:40,993 Stage-1 map = 15%, reduce = 0%, Cumulative CPU 163.87 sec-01-14 19:30:50,204 Stage-1 map = 17%, reduce = 0%, Cumulative CPU 181.82 sec-01-14 19:30:59,400 Stage-1 map = 19%, reduce = 0%, Cumulative CPU 199.95 sec-01-14 19:31:07,590 Stage-1 map = 21%, reduce = 0%, Cumulative CPU 217.98 sec-01-14 19:31:16,793 Stage-1 map = 22%, reduce = 0%, Cumulative CPU 236.08 sec-01-14 19:31:25,977 Stage-1 map = 24%, reduce = 0%, Cumulative CPU 254.43 sec-01-14 19:31:35,171 Stage-1 map = 26%, reduce = 0%, Cumulative CPU 272.32 sec-01-14 19:31:44,396 Stage-1 map = 27%, reduce = 0%, Cumulative CPU 290.2 sec-01-14 19:31:53,607 Stage-1 map = 28%, reduce = 0%, Cumulative CPU 299.28 sec-01-14 19:31:54,632 Stage-1 map = 29%, reduce = 0%, Cumulative CPU 308.24 sec-01-14 19:32:02,821 Stage-1 map = 31%, reduce = 0%, Cumulative CPU 326.33 sec-01-14 19:32:11,010 Stage-1 map = 32%, reduce = 0%, Cumulative CPU 335.17 sec-01-14 19:32:20,232 Stage-1 map = 33%, reduce = 0%, Cumulative CPU 352.9 sec-01-14 19:32:21,249 Stage-1 map = 34%, reduce = 0%, Cumulative CPU 361.87 sec-01-14 19:32:29,417 Stage-1 map = 35%, reduce = 0%, Cumulative CPU 370.96 sec-01-14 19:32:30,439 Stage-1 map = 36%, reduce = 0%, Cumulative CPU 379.86 sec-01-14 19:32:38,604 Stage-1 map = 37%, reduce = 0%, Cumulative CPU 388.81 sec-01-14 19:32:39,629 Stage-1 map = 38%, reduce = 0%, Cumulative CPU 397.52 sec-01-14 19:32:47,806 Stage-1 map = 39%, reduce = 0%, Cumulative CPU 415.38 sec-01-14 19:32:55,995 Stage-1 map = 40%, reduce = 0%, Cumulative CPU 424.22 sec-01-14 19:32:57,018 Stage-1 map = 41%, reduce = 0%, Cumulative CPU 433.11 sec-01-14 19:33:05,196 Stage-1 map = 42%, reduce = 0%, Cumulative CPU 442.05 sec-01-14 19:33:06,219 Stage-1 map = 43%, reduce = 0%, Cumulative CPU 450.8 sec-01-14 19:33:14,396 Stage-1 map = 44%, reduce = 0%, Cumulative CPU 459.74 sec-01-14 19:33:23,578 Stage-1 map = 45%, reduce = 0%, Cumulative CPU 477.34 sec-01-14 19:33:24,602 Stage-1 map = 46%, reduce = 0%, Cumulative CPU 486.52 sec-01-14 19:33:32,783 Stage-1 map = 48%, reduce = 0%, Cumulative CPU 504.15 sec-01-14 19:33:41,972 Stage-1 map = 50%, reduce = 0%, Cumulative CPU 522.16 sec-01-14 19:33:52,197 Stage-1 map = 51%, reduce = 0%, Cumulative CPU 539.2 sec-01-14 19:33:58,336 Stage-1 map = 52%, reduce = 0%, Cumulative CPU 548.73 sec-01-14 19:34:02,427 Stage-1 map = 53%, reduce = 0%, Cumulative CPU 557.99 sec-01-14 19:34:07,545 Stage-1 map = 54%, reduce = 0%, Cumulative CPU 567.16 sec-01-14 19:34:11,632 Stage-1 map = 55%, reduce = 0%, Cumulative CPU 576.08 sec-01-14 19:34:16,749 Stage-1 map = 56%, reduce = 0%, Cumulative CPU 585.06 sec-01-14 19:34:25,935 Stage-1 map = 57%, reduce = 0%, Cumulative CPU 603.52 sec-01-14 19:34:30,026 Stage-1 map = 58%, reduce = 0%, Cumulative CPU 612.49 sec-01-14 19:34:35,140 Stage-1 map = 59%, reduce = 0%, Cumulative CPU 621.77 sec-01-14 19:34:38,210 Stage-1 map = 60%, reduce = 0%, Cumulative CPU 630.7 sec-01-14 19:34:43,317 Stage-1 map = 61%, reduce = 0%, Cumulative CPU 639.91 sec-01-14 19:34:47,410 Stage-1 map = 62%, reduce = 0%, Cumulative CPU 648.89 sec-01-14 19:34:56,615 Stage-1 map = 63%, reduce = 0%, Cumulative CPU 666.84 sec-01-14 19:35:01,727 Stage-1 map = 64%, reduce = 0%, Cumulative CPU 675.6 sec-01-14 19:35:05,819 Stage-1 map = 65%, reduce = 0%, Cumulative CPU 684.69 sec-01-14 19:35:10,933 Stage-1 map = 66%, reduce = 0%, Cumulative CPU 693.59 sec-01-14 19:35:15,021 Stage-1 map = 67%, reduce = 0%, Cumulative CPU 702.23 sec-01-14 19:35:20,133 Stage-1 map = 68%, reduce = 0%, Cumulative CPU 711.21 sec-01-14 19:35:28,322 Stage-1 map = 69%, reduce = 0%, Cumulative CPU 729.04 sec-01-14 19:35:32,415 Stage-1 map = 70%, reduce = 0%, Cumulative CPU 738.03 sec-01-14 19:35:37,526 Stage-1 map = 71%, reduce = 0%, Cumulative CPU 747.06 sec-01-14 19:35:41,611 Stage-1 map = 72%, reduce = 0%, Cumulative CPU 756.06 sec-01-14 19:35:46,719 Stage-1 map = 73%, reduce = 0%, Cumulative CPU 765.09 sec-01-14 19:35:50,806 Stage-1 map = 74%, reduce = 0%, Cumulative CPU 774.04 sec-01-14 19:36:00,012 Stage-1 map = 75%, reduce = 0%, Cumulative CPU 791.77 sec-01-14 19:36:05,123 Stage-1 map = 76%, reduce = 0%, Cumulative CPU 800.75 sec-01-14 19:36:09,214 Stage-1 map = 77%, reduce = 0%, Cumulative CPU 809.81 sec-01-14 19:36:14,328 Stage-1 map = 78%, reduce = 0%, Cumulative CPU 818.69 sec-01-14 19:36:18,415 Stage-1 map = 79%, reduce = 0%, Cumulative CPU 827.72 sec-01-14 19:36:26,584 Stage-1 map = 80%, reduce = 0%, Cumulative CPU 845.87 sec-01-14 19:36:31,694 Stage-1 map = 81%, reduce = 0%, Cumulative CPU 854.89 sec-01-14 19:36:35,785 Stage-1 map = 82%, reduce = 0%, Cumulative CPU 863.86 sec-01-14 19:36:44,993 Stage-1 map = 83%, reduce = 0%, Cumulative CPU 872.92 sec-01-14 19:36:47,043 Stage-1 map = 83%, reduce = 1%, Cumulative CPU 873.65 sec-01-14 19:36:54,202 Stage-1 map = 84%, reduce = 1%, Cumulative CPU 882.71 sec-01-14 19:37:02,383 Stage-1 map = 85%, reduce = 1%, Cumulative CPU 891.72 sec-01-14 19:37:20,770 Stage-1 map = 86%, reduce = 1%, Cumulative CPU 909.77 sec-01-14 19:37:29,964 Stage-1 map = 87%, reduce = 1%, Cumulative CPU 918.97 sec-01-14 19:37:39,160 Stage-1 map = 88%, reduce = 1%, Cumulative CPU 928.27 sec-01-14 19:37:48,352 Stage-1 map = 89%, reduce = 1%, Cumulative CPU 937.24 sec-01-14 19:37:56,530 Stage-1 map = 90%, reduce = 1%, Cumulative CPU 946.29 sec-01-14 19:38:05,734 Stage-1 map = 91%, reduce = 1%, Cumulative CPU 955.18 sec-01-14 19:38:24,140 Stage-1 map = 92%, reduce = 1%, Cumulative CPU 973.53 sec-01-14 19:38:33,352 Stage-1 map = 93%, reduce = 1%, Cumulative CPU 982.47 sec-01-14 19:38:41,529 Stage-1 map = 94%, reduce = 1%, Cumulative CPU 991.47 sec-01-14 19:38:50,735 Stage-1 map = 95%, reduce = 1%, Cumulative CPU 1000.3 sec-01-14 19:38:59,949 Stage-1 map = 96%, reduce = 1%, Cumulative CPU 1009.3 sec-01-14 19:39:09,158 Stage-1 map = 97%, reduce = 1%, Cumulative CPU 1018.28 sec-01-14 19:39:27,573 Stage-1 map = 98%, reduce = 1%, Cumulative CPU 1036.53 sec-01-14 19:39:35,741 Stage-1 map = 99%, reduce = 1%, Cumulative CPU 1045.53 sec-01-14 19:39:44,943 Stage-1 map = 100%, reduce = 1%, Cumulative CPU 1054.51 sec-01-14 19:39:45,966 Stage-1 map = 100%, reduce = 3%, Cumulative CPU 1056.5 sec-01-14 19:39:49,040 Stage-1 map = 100%, reduce = 6%, Cumulative CPU 1059.11 sec-01-14 19:39:50,063 Stage-1 map = 100%, reduce = 10%, Cumulative CPU 1061.79 sec-01-14 19:39:53,133 Stage-1 map = 100%, reduce = 13%, Cumulative CPU 1064.52 sec-01-14 19:39:54,154 Stage-1 map = 100%, reduce = 16%, Cumulative CPU 1067.05 sec-01-14 19:39:57,223 Stage-1 map = 100%, reduce = 19%, Cumulative CPU 1069.71 sec-01-14 19:39:58,246 Stage-1 map = 100%, reduce = 23%, Cumulative CPU 1072.25 sec-01-14 19:40:01,296 Stage-1 map = 100%, reduce = 26%, Cumulative CPU 1074.89 sec-01-14 19:40:02,318 Stage-1 map = 100%, reduce = 29%, Cumulative CPU 1077.5 sec-01-14 19:40:05,376 Stage-1 map = 100%, reduce = 32%, Cumulative CPU 1080.12 sec-01-14 19:40:06,400 Stage-1 map = 100%, reduce = 35%, Cumulative CPU 1082.7 sec-01-14 19:40:09,466 Stage-1 map = 100%, reduce = 39%, Cumulative CPU 1085.39 sec-01-14 19:40:10,490 Stage-1 map = 100%, reduce = 42%, Cumulative CPU 1087.95 sec-01-14 19:40:13,548 Stage-1 map = 100%, reduce = 45%, Cumulative CPU 1090.61 sec-01-14 19:40:14,572 Stage-1 map = 100%, reduce = 48%, Cumulative CPU 1093.35 sec-01-14 19:40:17,633 Stage-1 map = 100%, reduce = 52%, Cumulative CPU 1095.98 sec-01-14 19:40:18,650 Stage-1 map = 100%, reduce = 55%, Cumulative CPU 1098.62 sec-01-14 19:40:21,720 Stage-1 map = 100%, reduce = 58%, Cumulative CPU 1101.33 sec-01-14 19:40:22,737 Stage-1 map = 100%, reduce = 61%, Cumulative CPU 1103.94 sec-01-14 19:40:25,791 Stage-1 map = 100%, reduce = 65%, Cumulative CPU 1106.75 sec-01-14 19:40:26,815 Stage-1 map = 100%, reduce = 68%, Cumulative CPU 1109.47 sec-01-14 19:40:28,856 Stage-1 map = 100%, reduce = 71%, Cumulative CPU 1112.24 sec-01-14 19:40:30,909 Stage-1 map = 100%, reduce = 74%, Cumulative CPU 1115.08 sec-01-14 19:40:32,951 Stage-1 map = 100%, reduce = 77%, Cumulative CPU 1117.95 sec-01-14 19:40:33,977 Stage-1 map = 100%, reduce = 81%, Cumulative CPU 1120.63 sec-01-14 19:40:37,038 Stage-1 map = 100%, reduce = 84%, Cumulative CPU 1123.47 sec-01-14 19:40:38,061 Stage-1 map = 100%, reduce = 87%, Cumulative CPU 1126.17 sec-01-14 19:40:41,128 Stage-1 map = 100%, reduce = 90%, Cumulative CPU 1128.84 sec-01-14 19:40:42,149 Stage-1 map = 100%, reduce = 94%, Cumulative CPU 1131.45 sec-01-14 19:40:45,220 Stage-1 map = 100%, reduce = 97%, Cumulative CPU 1134.14 sec-01-14 19:40:46,243 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 1136.69 secMapReduce Total cumulative CPU time: 18 minutes 56 seconds 690 msecEnded Job = job_1610015767041_0033Loading data to table default.merge_test2MapReduce Jobs Launched: Stage-Stage-1: Map: 117 Reduce: 31 Cumulative CPU: 1136.69 sec HDFS Read: 31437081538 HDFS Write: 1765 HDFS EC Read: 0 SUCCESSTotal MapReduce CPU Time Spent: 18 minutes 56 seconds 690 msecOKTime taken: 699.009 secondshive> > hive> set hive.exec.reducers.bytes.per.reducer = 1024000000;hive> set mapred.max.split.size=1024000000; hive> CREATE TABLE merge_test3( > prod_namestring, > max_sale_nums int,> min_sale_nums int> )> STORED AS textfile ;OKTime taken: 0.076 secondshive> > insert into merge_test3> select prod_name,max(sale_nums),min(sale_nums)> from ods_fact_sale > group by prod_name;Query ID = root_0114194538_e8f2b367-cdcd-42c4-825b-53a5595a14a1Total jobs = 1Launching Job 1 out of 1Number of reduce tasks not specified. Estimated from input data size: 31In order to change the average load for a reducer (in bytes):set hive.exec.reducers.bytes.per.reducer=<number>In order to limit the maximum number of reducers:set hive.exec.reducers.max=<number>In order to set a constant number of reducers:set mapreduce.job.reduces=<number>Starting Job = job_1610015767041_0034, Tracking URL = http://hp1:8088/proxy/application_1610015767041_0034/Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/hadoop job -kill job_1610015767041_0034Hadoop job information for Stage-1: number of mappers: 30; number of reducers: 31-01-14 19:45:45,306 Stage-1 map = 0%, reduce = 0%-01-14 19:46:02,766 Stage-1 map = 1%, reduce = 0%, Cumulative CPU 15.31 sec-01-14 19:46:03,793 Stage-1 map = 2%, reduce = 0%, Cumulative CPU 30.5 sec-01-14 19:46:08,920 Stage-1 map = 3%, reduce = 0%, Cumulative CPU 36.79 sec-01-14 19:46:09,945 Stage-1 map = 4%, reduce = 0%, Cumulative CPU 42.99 sec-01-14 19:46:10,968 Stage-1 map = 5%, reduce = 0%, Cumulative CPU 45.27 sec-01-14 19:46:11,996 Stage-1 map = 7%, reduce = 0%, Cumulative CPU 47.59 sec-01-14 19:46:26,330 Stage-1 map = 8%, reduce = 0%, Cumulative CPU 62.73 sec-01-14 19:46:27,359 Stage-1 map = 9%, reduce = 0%, Cumulative CPU 77.85 sec-01-14 19:46:32,466 Stage-1 map = 10%, reduce = 0%, Cumulative CPU 84.08 sec-01-14 19:46:33,491 Stage-1 map = 11%, reduce = 0%, Cumulative CPU 90.32 sec-01-14 19:46:34,516 Stage-1 map = 12%, reduce = 0%, Cumulative CPU 92.67 sec-01-14 19:46:35,542 Stage-1 map = 13%, reduce = 0%, Cumulative CPU 95.09 sec-01-14 19:46:48,834 Stage-1 map = 14%, reduce = 0%, Cumulative CPU 110.26 sec-01-14 19:46:49,859 Stage-1 map = 16%, reduce = 0%, Cumulative CPU 125.56 sec-01-14 19:46:56,009 Stage-1 map = 17%, reduce = 0%, Cumulative CPU 137.94 sec-01-14 19:46:57,026 Stage-1 map = 19%, reduce = 0%, Cumulative CPU 139.66 sec-01-14 19:46:58,050 Stage-1 map = 20%, reduce = 0%, Cumulative CPU 141.64 sec-01-14 19:47:12,378 Stage-1 map = 21%, reduce = 0%, Cumulative CPU 156.7 sec-01-14 19:47:13,397 Stage-1 map = 22%, reduce = 0%, Cumulative CPU 171.78 sec-01-14 19:47:18,504 Stage-1 map = 23%, reduce = 0%, Cumulative CPU 177.95 sec-01-14 19:47:19,523 Stage-1 map = 24%, reduce = 0%, Cumulative CPU 184.12 sec-01-14 19:47:20,547 Stage-1 map = 25%, reduce = 0%, Cumulative CPU 186.18 sec-01-14 19:47:21,571 Stage-1 map = 27%, reduce = 0%, Cumulative CPU 188.04 sec-01-14 19:47:34,873 Stage-1 map = 28%, reduce = 0%, Cumulative CPU 203.22 sec-01-14 19:47:35,891 Stage-1 map = 29%, reduce = 0%, Cumulative CPU 218.4 sec-01-14 19:47:41,020 Stage-1 map = 30%, reduce = 0%, Cumulative CPU 224.63 sec-01-14 19:47:42,043 Stage-1 map = 31%, reduce = 0%, Cumulative CPU 230.9 sec-01-14 19:47:43,068 Stage-1 map = 32%, reduce = 0%, Cumulative CPU 232.75 sec-01-14 19:47:44,090 Stage-1 map = 33%, reduce = 0%, Cumulative CPU 234.97 sec-01-14 19:47:58,399 Stage-1 map = 34%, reduce = 0%, Cumulative CPU 250.1 sec-01-14 19:47:59,422 Stage-1 map = 36%, reduce = 0%, Cumulative CPU 265.39 sec-01-14 19:48:05,548 Stage-1 map = 37%, reduce = 0%, Cumulative CPU 277.83 sec-01-14 19:48:06,569 Stage-1 map = 39%, reduce = 0%, Cumulative CPU 279.85 sec-01-14 19:48:07,593 Stage-1 map = 40%, reduce = 0%, Cumulative CPU 281.98 sec-01-14 19:48:20,881 Stage-1 map = 41%, reduce = 0%, Cumulative CPU 297.11 sec-01-14 19:48:22,927 Stage-1 map = 42%, reduce = 0%, Cumulative CPU 312.34 sec-01-14 19:48:27,019 Stage-1 map = 43%, reduce = 0%, Cumulative CPU 318.57 sec-01-14 19:48:28,040 Stage-1 map = 44%, reduce = 0%, Cumulative CPU 324.73 sec-01-14 19:48:29,060 Stage-1 map = 45%, reduce = 0%, Cumulative CPU 326.87 sec-01-14 19:48:31,106 Stage-1 map = 47%, reduce = 0%, Cumulative CPU 329.14 sec-01-14 19:48:38,246 Stage-1 map = 50%, reduce = 0%, Cumulative CPU 336.81 sec-01-14 19:48:44,370 Stage-1 map = 51%, reduce = 0%, Cumulative CPU 352.0 sec-01-14 19:48:50,502 Stage-1 map = 52%, reduce = 0%, Cumulative CPU 358.16 sec-01-14 19:48:52,552 Stage-1 map = 53%, reduce = 0%, Cumulative CPU 360.16 sec-01-14 19:48:53,575 Stage-1 map = 54%, reduce = 0%, Cumulative CPU 375.37 sec-01-14 19:48:59,710 Stage-1 map = 55%, reduce = 0%, Cumulative CPU 381.54 sec-01-14 19:49:01,748 Stage-1 map = 57%, reduce = 0%, Cumulative CPU 384.38 sec-01-14 19:49:07,885 Stage-1 map = 58%, reduce = 0%, Cumulative CPU 399.4 sec-01-14 19:49:17,082 Stage-1 map = 60%, reduce = 0%, Cumulative CPU 408.73 sec-01-14 19:49:18,104 Stage-1 map = 61%, reduce = 0%, Cumulative CPU 423.95 sec-01-14 19:49:23,212 Stage-1 map = 62%, reduce = 0%, Cumulative CPU 430.2 sec-01-14 19:49:26,275 Stage-1 map = 63%, reduce = 0%, Cumulative CPU 432.97 sec-01-14 19:49:31,380 Stage-1 map = 64%, reduce = 0%, Cumulative CPU 448.18 sec-01-14 19:49:37,508 Stage-1 map = 65%, reduce = 0%, Cumulative CPU 454.34 sec-01-14 19:49:39,545 Stage-1 map = 67%, reduce = 0%, Cumulative CPU 456.52 sec-01-14 19:49:41,590 Stage-1 map = 68%, reduce = 0%, Cumulative CPU 471.7 sec-01-14 19:49:47,722 Stage-1 map = 69%, reduce = 0%, Cumulative CPU 477.89 sec-01-14 19:49:49,767 Stage-1 map = 70%, reduce = 0%, Cumulative CPU 480.0 sec-01-14 19:49:54,880 Stage-1 map = 71%, reduce = 0%, Cumulative CPU 495.11 sec-01-14 19:50:00,997 Stage-1 map = 72%, reduce = 0%, Cumulative CPU 501.3 sec-01-14 19:50:02,020 Stage-1 map = 73%, reduce = 0%, Cumulative CPU 503.17 sec-01-14 19:50:05,084 Stage-1 map = 74%, reduce = 0%, Cumulative CPU 518.31 sec-01-14 19:50:10,196 Stage-1 map = 75%, reduce = 0%, Cumulative CPU 524.53 sec-01-14 19:50:12,241 Stage-1 map = 77%, reduce = 0%, Cumulative CPU 526.52 sec-01-14 19:50:17,349 Stage-1 map = 78%, reduce = 0%, Cumulative CPU 541.63 sec-01-14 19:50:23,493 Stage-1 map = 79%, reduce = 0%, Cumulative CPU 547.87 sec-01-14 19:50:25,531 Stage-1 map = 80%, reduce = 0%, Cumulative CPU 549.97 sec-01-14 19:50:27,571 Stage-1 map = 81%, reduce = 0%, Cumulative CPU 565.14 sec-01-14 19:50:33,695 Stage-1 map = 82%, reduce = 0%, Cumulative CPU 571.29 sec-01-14 19:50:35,732 Stage-1 map = 83%, reduce = 0%, Cumulative CPU 573.29 sec-01-14 19:50:40,828 Stage-1 map = 84%, reduce = 0%, Cumulative CPU 588.33 sec-01-14 19:50:46,963 Stage-1 map = 85%, reduce = 0%, Cumulative CPU 594.58 sec-01-14 19:50:49,007 Stage-1 map = 87%, reduce = 0%, Cumulative CPU 596.57 sec-01-14 19:50:51,050 Stage-1 map = 87%, reduce = 1%, Cumulative CPU 597.28 sec-01-14 19:51:03,318 Stage-1 map = 88%, reduce = 1%, Cumulative CPU 612.73 sec-01-14 19:51:12,513 Stage-1 map = 90%, reduce = 1%, Cumulative CPU 621.48 sec-01-14 19:51:26,818 Stage-1 map = 91%, reduce = 1%, Cumulative CPU 636.67 sec-01-14 19:51:32,952 Stage-1 map = 92%, reduce = 1%, Cumulative CPU 643.0 sec-01-14 19:51:34,994 Stage-1 map = 93%, reduce = 1%, Cumulative CPU 644.94 sec-01-14 19:51:50,317 Stage-1 map = 94%, reduce = 1%, Cumulative CPU 660.14 sec-01-14 19:51:55,433 Stage-1 map = 95%, reduce = 1%, Cumulative CPU 666.44 sec-01-14 19:51:57,482 Stage-1 map = 97%, reduce = 1%, Cumulative CPU 668.28 sec-01-14 19:52:12,821 Stage-1 map = 98%, reduce = 1%, Cumulative CPU 683.5 sec-01-14 19:52:18,953 Stage-1 map = 99%, reduce = 1%, Cumulative CPU 689.74 sec-01-14 19:52:20,999 Stage-1 map = 100%, reduce = 2%, Cumulative CPU 691.85 sec-01-14 19:52:22,021 Stage-1 map = 100%, reduce = 3%, Cumulative CPU 693.77 sec-01-14 19:52:25,090 Stage-1 map = 100%, reduce = 6%, Cumulative CPU 696.17 sec-01-14 19:52:26,114 Stage-1 map = 100%, reduce = 10%, Cumulative CPU 698.63 sec-01-14 19:52:29,177 Stage-1 map = 100%, reduce = 13%, Cumulative CPU 701.07 sec-01-14 19:52:30,200 Stage-1 map = 100%, reduce = 16%, Cumulative CPU 703.66 sec-01-14 19:52:33,257 Stage-1 map = 100%, reduce = 19%, Cumulative CPU 706.24 sec-01-14 19:52:34,281 Stage-1 map = 100%, reduce = 23%, Cumulative CPU 708.46 sec-01-14 19:52:37,345 Stage-1 map = 100%, reduce = 26%, Cumulative CPU 710.83 sec-01-14 19:52:38,367 Stage-1 map = 100%, reduce = 29%, Cumulative CPU 713.2 sec-01-14 19:52:41,428 Stage-1 map = 100%, reduce = 32%, Cumulative CPU 715.63 sec-01-14 19:52:42,453 Stage-1 map = 100%, reduce = 35%, Cumulative CPU 717.97 sec-01-14 19:52:44,497 Stage-1 map = 100%, reduce = 39%, Cumulative CPU 720.34 sec-01-14 19:52:46,538 Stage-1 map = 100%, reduce = 42%, Cumulative CPU 722.84 sec-01-14 19:52:48,576 Stage-1 map = 100%, reduce = 45%, Cumulative CPU 725.32 sec-01-14 19:52:49,600 Stage-1 map = 100%, reduce = 48%, Cumulative CPU 727.7 sec-01-14 19:52:52,664 Stage-1 map = 100%, reduce = 52%, Cumulative CPU 730.23 sec-01-14 19:52:53,686 Stage-1 map = 100%, reduce = 55%, Cumulative CPU 732.5 sec-01-14 19:52:56,769 Stage-1 map = 100%, reduce = 58%, Cumulative CPU 734.87 sec-01-14 19:52:57,791 Stage-1 map = 100%, reduce = 61%, Cumulative CPU 737.09 sec-01-14 19:53:00,854 Stage-1 map = 100%, reduce = 65%, Cumulative CPU 739.72 sec-01-14 19:53:01,877 Stage-1 map = 100%, reduce = 68%, Cumulative CPU 742.27 sec-01-14 19:53:04,939 Stage-1 map = 100%, reduce = 71%, Cumulative CPU 744.84 sec-01-14 19:53:05,958 Stage-1 map = 100%, reduce = 74%, Cumulative CPU 747.31 sec-01-14 19:53:09,013 Stage-1 map = 100%, reduce = 77%, Cumulative CPU 749.87 sec-01-14 19:53:10,030 Stage-1 map = 100%, reduce = 81%, Cumulative CPU 752.43 sec-01-14 19:53:13,095 Stage-1 map = 100%, reduce = 84%, Cumulative CPU 755.09 sec-01-14 19:53:14,115 Stage-1 map = 100%, reduce = 87%, Cumulative CPU 757.62 sec-01-14 19:53:17,176 Stage-1 map = 100%, reduce = 90%, Cumulative CPU 760.0 sec-01-14 19:53:18,192 Stage-1 map = 100%, reduce = 94%, Cumulative CPU 762.38 sec-01-14 19:53:21,260 Stage-1 map = 100%, reduce = 97%, Cumulative CPU 764.78 sec-01-14 19:53:22,282 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 767.18 secMapReduce Total cumulative CPU time: 12 minutes 47 seconds 180 msecEnded Job = job_1610015767041_0034Loading data to table default.merge_test3MapReduce Jobs Launched: Stage-Stage-1: Map: 30 Reduce: 31 Cumulative CPU: 767.18 sec HDFS Read: 31436674813 HDFS Write: 1765 HDFS EC Read: 0 SUCCESSTotal MapReduce CPU Time Spent: 12 minutes 47 seconds 180 msecOKTime taken: 465.782 secondshive>

五.控制Map/Reduce数

进行Hive开发的时候,经常会遇到map/reduce数量过多,导致hive job执行时间过长,性能低下。

此时我们可以控制map、reduce的数来对hive job进行优化

5.1 控制Hive job中的map数

通常情况下,作业会通过input的目录产生一个或者多个map任务。

主要的决定因素有: input的文件总个数,input的文件大小,集群设置的文件块大小(目前为128M, 可在hive中通过set dfs.block.size;命令查看到,该参数不能自定义修改);

举例:

假设input目录下有1个文件a,大小为780M,那么hadoop会将该文件a分隔成7个块(6个128m的块和1个12m的块),从而产生7个map数假设input目录下有3个文件a,b,c,大小分别为10m,20m,130m,那么hadoop会分隔成4个块(10m,20m,128m,2m),从而产生4个map数

即,如果文件大于块大小(128m),那么会拆分,如果小于块大小,则把该文件当成一个块。

map数越多越好吗?

答案是否定的。如果一个任务有很多小文件(远远小于块大小128m),则每个小文件也会被当做一个块,用一个map任务来完成,而一个map任务启动和初始化的时间远远大于逻辑处理的时间,就会造成很大的资源浪费。而且,同时可执行的map数是受限的。

是不是保证每个map处理接近128m的文件块,就高枕无忧了?

答案也是不一定。比如有一个127m的文件,正常会用一个map去完成,但这个文件只有一个或者两个小字段,却有几千万的记录,如果map处理的逻辑比较复杂,用一个map任务去做,肯定也比较耗时。

5.1.1 合并小文件,减小map数

假设一个SQL任务:

select count(*) from ods_fact_sale;

该任务的inputdir 总共有117个文件,其中很多是远远小于128m的小文件,总大小31G,正常执行会用117个map任务。

我通过以下方法来在map执行前合并小文件,减少map数:

set mapred.max.split.size=1024000000;

set mapred.min.split.size.per.node=100000000;

set mapred.min.split.size.per.rack=100000000;

set hive.input.format=org.apache.hadoop.hive.bineHiveInputFormat;

再执行上面的语句,用了30个map任务

对于这个简单SQL任务,执行时间上可能差不多,但节省了超过一半的计算资源。

set hive.input.format=org.apache.hadoop.hive.bineHiveInputFormat #执行Map前进行小文件合并

set mapred.max.split.size=1024000000; #每个Map最大输入大小

set mapred.min.split.size.per.node=100000000; #一个节点上split的至少的大小

set mapred.min.split.size.per.rack=100000000; #一个交换机下split的至少的大小

5.1.2 适当增加map数

代码:

create table ods_factsale_tmp1 as select * from ods_fact_sale where sale_date = '-08-16 00:00:00.0';select prod_name,count(*),count(distinct id),min(sale_date),max(sale_date), sum(case when id > 100000 then 1 else 0 end),sum(case when prod_name = 'PROD4' then 1 else 0 end),sum(case when prod_name = 'PROD5' then 1 else 0 end),sum(case when prod_name = 'PROD6' then 1 else 0 end),sum(case when prod_name = 'PROD7' then 1 else 0 end),sum(case when prod_name = 'PROD8' then 1 else 0 end),sum(case when prod_name = 'PROD9' then 1 else 0 end),sum(case when prod_name = 'PROD10' then 1 else 0 end)from ods_factsale_tmp1group by prod_name;set mapred.reduce.tasks=10;create table ods_factsale_tmp2 as select * from ods_factsale_tmp1 distribute by rand(123);set mapred.reduce.tasks=-1; -- 改回默认值值select prod_name,count(*),count(distinct id),min(sale_date),max(sale_date), sum(case when id > 100000 then 1 else 0 end),sum(case when prod_name = 'PROD4' then 1 else 0 end),sum(case when prod_name = 'PROD5' then 1 else 0 end),sum(case when prod_name = 'PROD6' then 1 else 0 end),sum(case when prod_name = 'PROD7' then 1 else 0 end),sum(case when prod_name = 'PROD8' then 1 else 0 end),sum(case when prod_name = 'PROD9' then 1 else 0 end),sum(case when prod_name = 'PROD10' then 1 else 0 end)from ods_factsale_tmp2group by prod_name;

测试记录:

可以看到,通过将表的数据重新进行分桶,mapper数由1变为了2,执行时长也由34秒优化到了28秒。

hive> > > select prod_name,> count(*),> count(distinct id),> min(sale_date),> max(sale_date), > sum(case when id > 100000 then 1 else 0 end),> sum(case when prod_name = 'PROD4' then 1 else 0 end),> sum(case when prod_name = 'PROD5' then 1 else 0 end),> sum(case when prod_name = 'PROD6' then 1 else 0 end),> sum(case when prod_name = 'PROD7' then 1 else 0 end),> sum(case when prod_name = 'PROD8' then 1 else 0 end),> sum(case when prod_name = 'PROD9' then 1 else 0 end),> sum(case when prod_name = 'PROD10' then 1 else 0 end)> from ods_factsale_tmp1> group by prod_name;Query ID = root_0115160619_cc2f311c-e387-4030-bf62-8934aa00f7e4Total jobs = 1Launching Job 1 out of 1Number of reduce tasks not specified. Estimated from input data size: 1In order to change the average load for a reducer (in bytes):set hive.exec.reducers.bytes.per.reducer=<number>In order to limit the maximum number of reducers:set hive.exec.reducers.max=<number>In order to set a constant number of reducers:set mapreduce.job.reduces=<number>Starting Job = job_1610015767041_0094, Tracking URL = http://hp1:8088/proxy/application_1610015767041_0094/Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/hadoop job -kill job_1610015767041_0094Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1-01-15 16:06:29,326 Stage-1 map = 0%, reduce = 0%-01-15 16:06:43,798 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 13.9 sec-01-15 16:06:53,086 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 22.71 secMapReduce Total cumulative CPU time: 22 seconds 710 msecEnded Job = job_1610015767041_0094MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 22.71 sec HDFS Read: 34999959 HDFS Write: 962 HDFS EC Read: 0 SUCCESSTotal MapReduce CPU Time Spent: 22 seconds 710 msecOKPROD10 94470 94470 -08-16 00:00:00.0 -08-16 00:00:00.0 94456 0 0 0 0 0 0 94470PROD2 94495 94495 -08-16 00:00:00.0 -08-16 00:00:00.0 94479 0 0 0 0 0 0 0PROD3 96743 96743 -08-16 00:00:00.0 -08-16 00:00:00.0 96730 0 0 0 0 0 0 0PROD4 94378 94378 -08-16 00:00:00.0 -08-16 00:00:00.0 94366 94378 0 0 0 0 0 0PROD5 96994 96994 -08-16 00:00:00.0 -08-16 00:00:00.0 96980 0 96994 0 0 0 0 0PROD6 91746 91746 -08-16 00:00:00.0 -08-16 00:00:00.0 91735 0 0 91746 0 0 0 0PROD7 95815 95815 -08-16 00:00:00.0 -08-16 00:00:00.0 95804 0 0 0 95815 0 0 0PROD8 95109 95109 -08-16 00:00:00.0 -08-16 00:00:00.0 95087 0 0 0 0 95109 0 0PROD9 95148 95148 -08-16 00:00:00.0 -08-16 00:00:00.0 95138 0 0 0 0 0 95148 0Time taken: 34.765 seconds, Fetched: 9 row(s)hive> set mapred.reduce.tasks=10;hive> create table ods_factsale_tmp2 as select * from ods_factsale_tmp1 distribute by rand(123);Query ID = root_0115160707_f02b478a-2754-43e0-83fa-59378a8c56f9Total jobs = 1Launching Job 1 out of 1Number of reduce tasks not specified. Defaulting to jobconf value of: 10In order to change the average load for a reducer (in bytes):set hive.exec.reducers.bytes.per.reducer=<number>In order to limit the maximum number of reducers:set hive.exec.reducers.max=<number>In order to set a constant number of reducers:set mapreduce.job.reduces=<number>Starting Job = job_1610015767041_0095, Tracking URL = http://hp1:8088/proxy/application_1610015767041_0095/Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/hadoop job -kill job_1610015767041_0095Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 10-01-15 16:07:15,539 Stage-1 map = 0%, reduce = 0%-01-15 16:07:23,798 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 6.66 sec-01-15 16:07:29,972 Stage-1 map = 100%, reduce = 10%, Cumulative CPU 10.39 sec-01-15 16:07:30,998 Stage-1 map = 100%, reduce = 20%, Cumulative CPU 14.03 sec-01-15 16:07:33,053 Stage-1 map = 100%, reduce = 30%, Cumulative CPU 17.55 sec-01-15 16:07:35,107 Stage-1 map = 100%, reduce = 40%, Cumulative CPU 21.14 sec-01-15 16:07:38,190 Stage-1 map = 100%, reduce = 50%, Cumulative CPU 25.41 sec-01-15 16:07:40,248 Stage-1 map = 100%, reduce = 60%, Cumulative CPU 29.55 sec-01-15 16:07:43,324 Stage-1 map = 100%, reduce = 70%, Cumulative CPU 34.19 sec-01-15 16:07:44,349 Stage-1 map = 100%, reduce = 80%, Cumulative CPU 37.74 sec-01-15 16:07:47,436 Stage-1 map = 100%, reduce = 90%, Cumulative CPU 42.06 sec-01-15 16:07:48,464 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 45.61 secMapReduce Total cumulative CPU time: 45 seconds 610 msecEnded Job = job_1610015767041_0095Moving data to directory hdfs://nameservice1/user/hive/warehouse/test.db/ods_factsale_tmp2MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Reduce: 10 Cumulative CPU: 45.61 sec HDFS Read: 35027561 HDFS Write: 34984665 HDFS EC Read: 0 SUCCESSTotal MapReduce CPU Time Spent: 45 seconds 610 msecOKTime taken: 42.892 secondshive> set mapred.reduce.tasks=-1;hive> select prod_name,> count(*),> count(distinct id),> min(sale_date),> max(sale_date), > sum(case when id > 100000 then 1 else 0 end),> sum(case when prod_name = 'PROD4' then 1 else 0 end),> sum(case when prod_name = 'PROD5' then 1 else 0 end),> sum(case when prod_name = 'PROD6' then 1 else 0 end),> sum(case when prod_name = 'PROD7' then 1 else 0 end),> sum(case when prod_name = 'PROD8' then 1 else 0 end),> sum(case when prod_name = 'PROD9' then 1 else 0 end),> sum(case when prod_name = 'PROD10' then 1 else 0 end)> from ods_factsale_tmp2> group by prod_name;Query ID = root_0115160934_909dd909-4761-4fe2-bb76-ba7b15b55b76Total jobs = 1Launching Job 1 out of 1Number of reduce tasks not specified. Estimated from input data size: 1In order to change the average load for a reducer (in bytes):set hive.exec.reducers.bytes.per.reducer=<number>In order to limit the maximum number of reducers:set hive.exec.reducers.max=<number>In order to set a constant number of reducers:set mapreduce.job.reduces=<number>Starting Job = job_1610015767041_0097, Tracking URL = http://hp1:8088/proxy/application_1610015767041_0097/Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/hadoop job -kill job_1610015767041_0097Hadoop job information for Stage-1: number of mappers: 2; number of reducers: 1-01-15 16:09:41,196 Stage-1 map = 0%, reduce = 0%-01-15 16:09:49,419 Stage-1 map = 50%, reduce = 0%, Cumulative CPU 7.88 sec-01-15 16:09:53,541 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 21.59 sec-01-15 16:10:01,760 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 29.81 secMapReduce Total cumulative CPU time: 29 seconds 810 msecEnded Job = job_1610015767041_0097MapReduce Jobs Launched: Stage-Stage-1: Map: 2 Reduce: 1 Cumulative CPU: 29.81 sec HDFS Read: 35008642 HDFS Write: 962 HDFS EC Read: 0 SUCCESSTotal MapReduce CPU Time Spent: 29 seconds 810 msecOKPROD10 94470 94470 -08-16 00:00:00.0 -08-16 00:00:00.0 94456 0 0 0 0 0 0 94470PROD2 94495 94495 -08-16 00:00:00.0 -08-16 00:00:00.0 94479 0 0 0 0 0 0 0PROD3 96743 96743 -08-16 00:00:00.0 -08-16 00:00:00.0 96730 0 0 0 0 0 0 0PROD4 94378 94378 -08-16 00:00:00.0 -08-16 00:00:00.0 94366 94378 0 0 0 0 0 0PROD5 96994 96994 -08-16 00:00:00.0 -08-16 00:00:00.0 96980 0 96994 0 0 0 0 0PROD6 91746 91746 -08-16 00:00:00.0 -08-16 00:00:00.0 91735 0 0 91746 0 0 0 0PROD7 95815 95815 -08-16 00:00:00.0 -08-16 00:00:00.0 95804 0 0 0 95815 0 0 0PROD8 95109 95109 -08-16 00:00:00.0 -08-16 00:00:00.0 95087 0 0 0 0 95109 0 0PROD9 95148 95148 -08-16 00:00:00.0 -08-16 00:00:00.0 95138 0 0 0 0 0 95148 0Time taken: 28.758 seconds, Fetched: 9 row(s)hive>

5.2 控制hive任务的reduce数

Hive 如何确定reduce数:

reduce个数的设定极大影响任务执行效率,不指定reduce个数的情况下,Hive会猜测确定一个reduce个数,基于以下两个设定:

hive.exec.reducers.bytes.per.reducer(每个reduce任务处理的数据量,默认为1000^3=1G)

hive.exec.reducers.max(每个任务最大的reduce数,默认为999)

计算reducer数的公式很简单N=min(参数2,总输入数据量/参数1)

即,如果reduce的输入(map的输出)总大小不超过1G,那么只会有一个reduce任务;

如何调整reduce数:

调整hive.exec.reducers.bytes.per.reducer参数的值;

set hive.exec.reducers.bytes.per.reducer=500000000; (500M)set mapred.reduce.tasks = 15;

reduce数量是不是越多越好?

同map一样,启动和初始化reduce也会消耗时间和资源;

另外,有多少个reduce,就会有多少个输出文件,如果生成了很多个小文件,那么如果这些小文件作为下一个任务的输入,则也会出现小文件过多的问题;

什么情况下只有一个reduce?

很多时候你会发现任务中不管数据量多大,不管你有没有设置调整reduce个数的参数,任务中一直都只有一个reduce任务;

其实只有一个reduce任务的情况,除了数据量小于hive.exec.reducers.bytes.per.reducer参数值的情况外,还有以下原因:

没有group by的汇总,比如把select sale_date,count() from ods_fact_sale_orc where sale_date = ‘-08-16 00:00:00.0’ group by sale_date; 写成 select count() from ods_fact_sale_orc where sale_date = ‘-08-16 00:00:00.0’;

这点非常常见,希望大家尽量改写。用了Order by有笛卡尔积

通常这些情况下,除了找办法来变通和避免,我暂时没有什么好的办法,因为这些操作都是全局的,所以hadoop不得不用一个reduce去完成;

同样的,在设置reduce个数的时候也需要考虑这两个原则:使大数据量利用合适的reduce数;使单个reduce任务处理合适的数据量;

测试记录:

reduce 数由33变为1,性能也得到了提升

hive> > select sale_date,count(*) from ods_fact_sale_orc where sale_date = '-08-16 00:00:00.0' group by sale_date;Query ID = root_0115171744_3ddbe697-6889-4f8b-9126-4fc59cff6c26Total jobs = 1Launching Job 1 out of 1Number of reduce tasks not specified. Estimated from input data size: 33In order to change the average load for a reducer (in bytes):set hive.exec.reducers.bytes.per.reducer=<number>In order to limit the maximum number of reducers:set hive.exec.reducers.max=<number>In order to set a constant number of reducers:set mapreduce.job.reduces=<number>Starting Job = job_1610015767041_0100, Tracking URL = http://hp1:8088/proxy/application_1610015767041_0100/Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/hadoop job -kill job_1610015767041_0100Hadoop job information for Stage-1: number of mappers: 9; number of reducers: 33-01-15 17:17:52,451 Stage-1 map = 0%, reduce = 0%-01-15 17:18:03,759 Stage-1 map = 22%, reduce = 0%, Cumulative CPU 18.51 sec-01-15 17:18:10,940 Stage-1 map = 33%, reduce = 0%, Cumulative CPU 24.4 sec-01-15 17:18:14,017 Stage-1 map = 44%, reduce = 0%, Cumulative CPU 33.6 sec-01-15 17:18:20,167 Stage-1 map = 56%, reduce = 0%, Cumulative CPU 43.26 sec-01-15 17:18:23,241 Stage-1 map = 67%, reduce = 0%, Cumulative CPU 52.4 sec-01-15 17:18:29,388 Stage-1 map = 78%, reduce = 0%, Cumulative CPU 61.63 sec-01-15 17:18:31,432 Stage-1 map = 89%, reduce = 0%, Cumulative CPU 70.53 sec-01-15 17:18:38,601 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 79.59 sec-01-15 17:18:40,648 Stage-1 map = 100%, reduce = 3%, Cumulative CPU 82.66 sec-01-15 17:18:42,691 Stage-1 map = 100%, reduce = 6%, Cumulative CPU 85.6 sec-01-15 17:18:44,740 Stage-1 map = 100%, reduce = 9%, Cumulative CPU 88.2 sec-01-15 17:18:46,793 Stage-1 map = 100%, reduce = 12%, Cumulative CPU 91.17 sec-01-15 17:18:48,841 Stage-1 map = 100%, reduce = 15%, Cumulative CPU 94.05 sec-01-15 17:18:50,885 Stage-1 map = 100%, reduce = 18%, Cumulative CPU 97.02 sec-01-15 17:18:52,931 Stage-1 map = 100%, reduce = 21%, Cumulative CPU 100.09 sec-01-15 17:18:54,975 Stage-1 map = 100%, reduce = 24%, Cumulative CPU 103.16 sec-01-15 17:18:57,018 Stage-1 map = 100%, reduce = 27%, Cumulative CPU 106.04 sec-01-15 17:18:59,068 Stage-1 map = 100%, reduce = 30%, Cumulative CPU 108.96 sec-01-15 17:19:01,117 Stage-1 map = 100%, reduce = 33%, Cumulative CPU 111.83 sec-01-15 17:19:03,162 Stage-1 map = 100%, reduce = 36%, Cumulative CPU 114.92 sec-01-15 17:19:04,183 Stage-1 map = 100%, reduce = 39%, Cumulative CPU 117.84 sec-01-15 17:19:06,230 Stage-1 map = 100%, reduce = 42%, Cumulative CPU 120.77 sec-01-15 17:19:08,293 Stage-1 map = 100%, reduce = 45%, Cumulative CPU 123.67 sec-01-15 17:19:10,335 Stage-1 map = 100%, reduce = 48%, Cumulative CPU 126.6 sec-01-15 17:19:12,383 Stage-1 map = 100%, reduce = 52%, Cumulative CPU 129.5 sec-01-15 17:19:14,429 Stage-1 map = 100%, reduce = 55%, Cumulative CPU 132.47 sec-01-15 17:19:16,481 Stage-1 map = 100%, reduce = 58%, Cumulative CPU 135.11 sec-01-15 17:19:18,520 Stage-1 map = 100%, reduce = 61%, Cumulative CPU 138.08 sec-01-15 17:19:20,571 Stage-1 map = 100%, reduce = 64%, Cumulative CPU 140.95 sec-01-15 17:19:22,616 Stage-1 map = 100%, reduce = 67%, Cumulative CPU 143.83 sec-01-15 17:19:24,664 Stage-1 map = 100%, reduce = 70%, Cumulative CPU 146.8 sec-01-15 17:19:26,713 Stage-1 map = 100%, reduce = 73%, Cumulative CPU 149.69 sec-01-15 17:19:28,761 Stage-1 map = 100%, reduce = 76%, Cumulative CPU 152.46 sec-01-15 17:19:30,802 Stage-1 map = 100%, reduce = 79%, Cumulative CPU 155.36 sec-01-15 17:19:32,846 Stage-1 map = 100%, reduce = 82%, Cumulative CPU 158.32 sec-01-15 17:19:34,894 Stage-1 map = 100%, reduce = 85%, Cumulative CPU 161.2 sec-01-15 17:19:36,937 Stage-1 map = 100%, reduce = 88%, Cumulative CPU 164.13 sec-01-15 17:19:38,977 Stage-1 map = 100%, reduce = 91%, Cumulative CPU 166.99 sec-01-15 17:19:41,024 Stage-1 map = 100%, reduce = 94%, Cumulative CPU 169.92 sec-01-15 17:19:43,069 Stage-1 map = 100%, reduce = 97%, Cumulative CPU 172.8 sec-01-15 17:19:45,116 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 175.63 secMapReduce Total cumulative CPU time: 2 minutes 55 seconds 630 msecEnded Job = job_1610015767041_0100MapReduce Jobs Launched: Stage-Stage-1: Map: 9 Reduce: 33 Cumulative CPU: 175.63 sec HDFS Read: 1170223064 HDFS Write: 2912 HDFS EC Read: 0 SUCCESSTotal MapReduce CPU Time Spent: 2 minutes 55 seconds 630 msecOK-08-16 00:00:00.0 854898Time taken: 121.334 seconds, Fetched: 1 row(s)hive> select count(*) from ods_fact_sale_orc where sale_date = '-08-16 00:00:00.0'; Query ID = root_0115172041_b4730546-abca-4c9c-8032-bb70888fb011Total jobs = 1Launching Job 1 out of 1Number of reduce tasks determined at compile time: 1In order to change the average load for a reducer (in bytes):set hive.exec.reducers.bytes.per.reducer=<number>In order to limit the maximum number of reducers:set hive.exec.reducers.max=<number>In order to set a constant number of reducers:set mapreduce.job.reduces=<number>Starting Job = job_1610015767041_0101, Tracking URL = http://hp1:8088/proxy/application_1610015767041_0101/Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/hadoop job -kill job_1610015767041_0101Hadoop job information for Stage-1: number of mappers: 9; number of reducers: 1-01-15 17:20:48,562 Stage-1 map = 0%, reduce = 0%-01-15 17:20:59,860 Stage-1 map = 11%, reduce = 0%, Cumulative CPU 8.93 sec-01-15 17:21:00,887 Stage-1 map = 22%, reduce = 0%, Cumulative CPU 17.94 sec-01-15 17:21:06,027 Stage-1 map = 33%, reduce = 0%, Cumulative CPU 23.75 sec-01-15 17:21:08,081 Stage-1 map = 44%, reduce = 0%, Cumulative CPU 32.82 sec-01-15 17:21:15,257 Stage-1 map = 56%, reduce = 0%, Cumulative CPU 42.24 sec-01-15 17:21:17,297 Stage-1 map = 67%, reduce = 0%, Cumulative CPU 51.22 sec-01-15 17:21:24,447 Stage-1 map = 78%, reduce = 0%, Cumulative CPU 60.42 sec-01-15 17:21:26,497 Stage-1 map = 89%, reduce = 0%, Cumulative CPU 69.58 sec-01-15 17:21:33,661 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 78.48 sec-01-15 17:21:34,685 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 81.46 secMapReduce Total cumulative CPU time: 1 minutes 21 seconds 460 msecEnded Job = job_1610015767041_0101MapReduce Jobs Launched: Stage-Stage-1: Map: 9 Reduce: 1 Cumulative CPU: 81.46 sec HDFS Read: 1170021233 HDFS Write: 106 HDFS EC Read: 0 SUCCESSTotal MapReduce CPU Time Spent: 1 minutes 21 seconds 460 msecOK854898Time taken: 54.258 seconds, Fetched: 1 row(s)hive>

参考

1./archives//04/92.htm

2./panfelix/article/details/107583723

3./archives//04/15.htm

本内容不代表本网观点和政治立场,如有侵犯你的权益请联系我们处理。
网友评论
网友评论仅供其表达个人看法,并不表明网站立场。