1200字范文,内容丰富有趣,写作的好帮手!
1200字范文 > hive concat_ws列转行排序问题

hive concat_ws列转行排序问题

时间:2024-05-08 14:27:48

相关推荐

hive concat_ws列转行排序问题

hive concat_ws列转行排序问题

最近同事开发一功能,用到列转行并且排序的时候发现有部分乱序问题。这里记录一下解决方案

文章目录

hive concat_ws列转行排序问题1. 问题描述2. 解决方案2.1 先在子查询中有序排列再在外层行转列2.2 将排序号与habbit预先拼接,这样就会按照排序号前缀排序2.3 将排序号左补零为4位,再拼接habbit

1. 问题描述

有如下一张表test_concat

spark-sql> desc test_concat;user_name string NULLhabbits string NULLorder_num int NULL

spark-sql> select * from test_concat order by user_name ,order_num ;jack games 1jack music 2jack movie 3jack sing 4jack 3c 5jack 5g 6jack phone 7jack pc 8jack xbox 9jack swing 10Jack xxxxx 11jack walk 12jim games 1jim movie 2jim dance 3jim music 4tom movie 1tom games 2tom music 3

功能需要按照用户将习惯根据排序合并,核心就是列转行。

同事的代码直接是concat_ws进行列转行:

select user_name ,concat_ws(',',collect_list(tbb.habbits)) as habbit from test_concat tbb group by tbb.user_name;

结果发现habbit新的habbit列没有按照order_num进行排序

2. 解决方案

下面提供几种解决方案

2.1 先在子查询中有序排列再在外层行转列

select user_name ,concat_ws(',',collect_set(tbb.habbits)) as habbit from (select * from test_concat order by order_num) tbb group by tbb.user_name;

这里发现仍然是乱序,因为collect_set是乱序的,需要使用collect_list

select user_name ,concat_ws(',',collect_list(tbb.habbits)) as habbit from (select * from test_concat order by order_num) tbb group by tbb.user_name;

但是这种方式也有风险。在测试的时候是正常排序,同事上生产之后发现还是会有乱序

2.2 将排序号与habbit预先拼接,这样就会按照排序号前缀排序

select tbb.user_name,concat_ws(',',sort_array(collect_set(tbb.habbit))) as habbit from(select user_name ,concat(cast(order_num as string),'_',habbits) as habbit,order_num from test_concat order by user_name,order_num)tbb group by tbb.user_name;

这里发现一个新的问题,concat_ws拼接的字段是string类型,1与10放到了一起,排序仍然有问题

2.3 将排序号左补零为4位,再拼接habbit

select tbb.user_name,concat_ws(',',sort_array(collect_set(tbb.habbit))) as habbit from(select user_name ,concat(cast(lpad(order_num,4,'0') as string),'_',habbits) as habbit,order_num from test_concat order by user_name,order_num)tbb group by tbb.user_name;

这种情况下,habbit一定是按照order_num的顺序进行拼接。

本内容不代表本网观点和政治立场,如有侵犯你的权益请联系我们处理。
网友评论
网友评论仅供其表达个人看法,并不表明网站立场。