hive 行列转换案例
0 stu表数据:
stu:
id name
hello,you zm2008
hello,me zm2015
1 实现单词计数: (列转行) ---> split切分+explode(炸开)
1.0 数据拆分成数组
select split(id,',') from stu; 得到数组
[hello,you]
[hello,me]
1.1 继续将数组拆分(hive explode函数会将数组继续拆分成单个字符)
select explode(split(id,',')) from stu; 窗体函数
hello
you
hello
me
1.2 分组统计:
select t1.c1, count(1) from (select explode(split(id,',')) as c1 from stu) t1 group by t1.c1;
hello 2
you 1
me 1
上述语句中, t1表是1.1的结果
2 行转列: 明确以哪个字段分组,分组后将列转成行使用
---> 多列间隔符concat_ws+封装多列组合体 collect_set+ 分组字段 grtoup by
公式: concat_ws(",",collect_set(要转成行的列)) group by 分组列
2.0 表数据
列转换行:
user
id name
1 zhangsan
2 lisi
3 wangwu
address
name addr
zhangsan beijing
zhangsan shanghai
lisi tianjin
wangwu nanjing
期待结果:
1 zhangsan beijing,shanghai
2 lisi tianjin
3 wangwu nanjing
2.1 函数介绍:
collect_set(x) 列转行函数---没有重复, 组装多列的数据的结构体
collect_list(x) 列转行函数---可以有重复,组装多列的数据的结构体
concat_ws 拼接函数, 用于多列转成同一行字段后,间隔符
2.2 操作步骤:
2.2.0:
select user.id, user.name, address.addr from user join address on user.name = address.name;
1 zhangsan beijing
2 zhangsan shanghai
3 lisi tianjin
4 wangwu nanjing
2.2.1:
select max(user.id), user.name, collect_set(address.addr) from user join address on user.name = address.name group by user.name;
2 lisi [tianjin]
3 wangwu [nanjing]
1 zhangsan [shanghai,beijing]
2.2.2: 以name分组,将name相同下的 addr列转变成行存储,并且分组后列为多个下下以,做间隔
select max(user.id) as id, user.name, concat_ws(",",collect_set(address.addr)) from user join address on user.name = address.name group by user.name order by id;
1 zhangsan shanghai,beijing
2 lisi tianjin
3 wangwu nanjing
2.3 多行转换一列案例2:
一、问题 hive如何将 a b 1 a b 2 a b 3 c d 4 c d 5 c d 6 变为: a b 1,2,3 c d 4,5,6 二、数据 test.txt a b 1 a b 2 a b 3 c d 4 c d 5 c d 6 三、答案 1.建表 drop table tmp_jiangzl_test; create table tmp_jiangzl_test ( col1 string, col2 string, col3 string ) row format delimited fields terminated by '\t' stored as textfile; load data local inpath '/home/jiangzl/shell/test.txt' into table tmp_jiangzl_test; 2.处理 select col1,col2,concat_ws(',',collect_set(col3)) from tmp_jiangzl_test group by col1,col2;
3 行转换列: 查询多个字段下 later view的使用
4 行转换列: 单表下写法
hive如何将
a b 1
a b 2
a b 3
c d 4
c d 5
c d 6
变为:
a b 1,2,3
c d 4,5,6
select col1,col2,concat_ws(',',collect_set(col3))
from tmp_jiangzl_test
group by col1,col2; ----------------》 已经验证过 OK