Hive 窗口函数
表数据如下
select * from business; business.name business.orderdate business.cost jack 2017-01-01 10 tony 2017-01-02 15 jack 2017-02-03 23 tony 2017-01-04 29 jack 2017-01-05 46 jack 2017-04-06 42 tony 2017-01-07 50 jack 2017-01-08 55 mart 2017-04-08 62 mart 2017-04-09 68 neil 2017-05-10 12 mart 2017-04-11 75 neil 2017-06-12 80 mart 2017-04-13 94
一、聚合函数开窗
1.全局范围
查询顾客姓名及总人数(多次购买只算一人)
select name,count(*) over() from business group by name; name count_window_0 jack 4 mart 4 neil 4 tony 4
2.排序后的范围
对所有人的消费明细,将 cost 按照日期进行累加
select *,sum(cost) over(order by orderdate) from business; business.name business.orderdate business.cost sum_window_0 jack 2017-01-01 10 10 tony 2017-01-02 15 25 tony 2017-01-04 29 54 jack 2017-01-05 46 100 tony 2017-01-07 50 150 jack 2017-01-08 55 205 jack 2017-02-03 23 228 jack 2017-04-06 42 270 mart 2017-04-08 62 332 mart 2017-04-09 68 400 mart 2017-04-11 75 475 mart 2017-04-13 94 569 neil 2017-05-10 12 581 neil 2017-06-12 80 661
3.分区加排序后的范围。分区排序搭配:distribute by + sort by或者partition by + order by。
将不同人的消费明细,按日期累加
select *,sum(cost) over(distribute by name sort by orderdate) from business; business.name business.orderdate business.cost sum_window_0 jack 2017-01-01 10 10 jack 2017-01-05 46 56 jack 2017-01-08 55 111 jack 2017-02-03 23 134 jack 2017-04-06 42 176 mart 2017-04-08 62 62 mart 2017-04-09 68 130 mart 2017-04-11 75 205 mart 2017-04-13 94 299 neil 2017-05-10 12 12 neil 2017-06-12 80 92 tony 2017-01-02 15 15 tony 2017-01-04 29 44 tony 2017-01-07 50 94
4.其他范围
①开始到当前行
over(partition by name order by orderdate rows between UNBOUNDED PRECEDING and current row )
②当前行到最后
over(partition by name order by orderdate rows between current row and UNBOUNDED FOLLOWING)
③前一行、当前行和下一行
over(partition by name order by orderdate rows between 1 PRECEDING AND 1 FOLLOWING)
二、其他函数开窗
1.lag()函数
查询每个顾客的购买明细及上一次购买时间
select *,lag(orderdate,1,‘1970-01-01‘) over(distribute by name sort by orderdate) from business; business.name business.orderdate business.cost lag_window_0 jack 2017-01-01 10 1970-01-01 jack 2017-01-05 46 2017-01-01 jack 2017-01-08 55 2017-01-05 jack 2017-02-03 23 2017-01-08 jack 2017-04-06 42 2017-02-03 mart 2017-04-08 62 1970-01-01 mart 2017-04-09 68 2017-04-08 mart 2017-04-11 75 2017-04-09 mart 2017-04-13 94 2017-04-11 neil 2017-05-10 12 1970-01-01 neil 2017-06-12 80 2017-05-10 tony 2017-01-02 15 1970-01-01 tony 2017-01-04 29 2017-01-02 tony 2017-01-07 50 2017-01-04
2.lead()函数
查询每个顾客的购买明细及下一次购买时间
select *,lead(orderdate,1,‘9999-99-99‘) over(distribute by name sort by orderdate) from business; jack 2017-01-01 10 2017-01-05 jack 2017-01-05 46 2017-01-08 jack 2017-01-08 55 2017-02-03 jack 2017-02-03 23 2017-04-06 jack 2017-04-06 42 9999-99-99 mart 2017-04-08 62 2017-04-09 mart 2017-04-09 68 2017-04-11 mart 2017-04-11 75 2017-04-13 mart 2017-04-13 94 9999-99-99 neil 2017-05-10 12 2017-06-12 neil 2017-06-12 80 9999-99-99 tony 2017-01-02 15 2017-01-04 tony 2017-01-04 29 2017-01-07 tony 2017-01-07 50 9999-99-99
3.ntile()函数
查询前20%时间的订单信息
select * from (select *,ntile(5) over(order by orderdate) nt from business) t1 where t1.nt = 1; t1.name t1.orderdate t1.cost t1.nt jack 2017-01-01 10 1 tony 2017-01-02 15 1 tony 2017-01-04 29 1
相关推荐
archive 2020-07-30
成长之路 2020-07-28
eternityzzy 2020-07-19
taisenki 2020-07-05
tugangkai 2020-07-05
SignalDu 2020-07-05
zlsdmx 2020-07-05
tomson 2020-07-05
tugangkai 2020-07-04
tomson 2020-07-05
Zhangdragonfly 2020-06-28
genshengxiao 2020-06-26
成长之路 2020-06-26
tomson 2020-06-26
蜗牛之窝 2020-06-26
成长之路 2020-06-25