[Hive]编写non-deterministic的UDF时遇到的谓词下推(Predicate
遇到这样一个问题:
hive> desc ljn001;
OK
name string
value int
hive> select * from ljn001;
OK
wang5 92
zhang3 87
li4 73
查询SQL如下:
select name
from
(select name,row_number(1) as rn
from
(select *
from ljn001 order by value
) a
) a
where rn = 1;
但是结果却意想不到:
OK
wang5
竟然把value最大的给取出来了!不筛选rn再看一下:
select name,rn
from
(select name,row_number(1) as rn
from
(select *
from ljn001 order by value
) a
) a;
OK
li4 1
zhang3 2
wang5 3
明明是li4对应的1,为什么筛选rn = 1却得到的是wang5 ?
看一下执行计划,豁然开朗了:
explain select name
from
(select name,row_number(1) as rn
from
(select *
from ljn001 order by value
) a
) a
where rn = 1;
OK
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 is a root stage
STAGE PLANS:
Stage: Stage-1
Map Reduce
Alias -> Map Operator Tree:
a:a:ljn001
TableScan
alias: ljn001
Filter Operator
predicate:
expr: (row_number(1) = 1)
type: boolean
Select Operator
expressions:
expr: name
type: string
expr: value
type: int
outputColumnNames: _col0, _col1
Reduce Output Operator
key expressions:
expr: _col1
type: int
sort order: +
tag: -1
value expressions:
expr: _col0
type: string
expr: _col1
type: int
Reduce Operator Tree:
Extract
Select Operator
expressions:
expr: _col0
type: string
expr: row_number(1)
type: bigint
outputColumnNames: _col0, _col1
Filter Operator
predicate:
expr: (_col1 = 1)
type: boolean
Select Operator
expressions:
expr: _col0
type: string
outputColumnNames: _col0
File Output Operator
compressed: true
GlobalTableId: 0
table:
input format: org.apache.Hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Stage: Stage-0
Fetch Operator
limit: -1
Hive 的详细介绍:请点这里
Hive 的下载地址:请点这里
相关阅读: