Hive窗口函数
1、row_number over():分组排序+序号标记
假如我们有这样一组数据,我们需要求出不同性别的年龄top2的人的信息。这个时候怎么做?可能我们会首先想到分组,但是分组只能值top1,怎么样能求出top2,top3呢?这时候我们想如果分组后能够按照年龄排序然后标出来序号就好了!
id age name sex
1,18,xiaoli,male
2,19,wang,male
3,22,liu,female
4,16,dawei,male
5,30,erbao,male
6,26,xiao,female
7,18,chengua,male
比如以上求解不同性别的年龄top2,我们可以这样做:
建表导入数据:
create table rownumber(id string,age int,name string,sex string)
row format delimited
fields terminated by ‘,‘;
load data local inpath ‘/root/mytest/rowover.dat‘ into table rownumber;
select id,age,name,sex,
row_number() over(partition by sex order by age desc) as rownumber
from rownumber;
可以清楚的看到 row_number() over(partition by sex order by age desc) as rownumber
就相当于增加了一列序号,over()中partition by sex是按照sex分组,order by age desc按照年龄降序排序,然后row_number()在加上序号。
select id,age,name,sex
from
(select id,age,name,sex,
row_number() over(partition by sex order by age desc) as rownumber
from rownumber ) temp
where rownumber<3;
转载博客:https://blog.csdn.net/weixin_39043567/article/details/90612526