hive 全面学习
1.HiveCLI(hive命令行commandline)
hive命令行选项:
-dk=v(定义变量)-e""-ffilename-hhost-pport-v(控制台显示执行的hql)
hive交互模式:
set;显示hive中的所有变量,例如setmapred.reduce.tasks=32;
setk=v:如果k不存在,不会报错
!shellcommand:hive交互模式执行shell,例如:!echoaa
dfscommand:hive交互模式执行hadoopfs的命令,和hadoopfs命令相同
set:输出hive设置的变量
数据类型:
Integers
TINYINT-1byteinteger
SMALLINT-2byteinteger
INT-4byteinteger
BIGINT-8byteinteger
Booleantype
BOOLEAN-TRUE/FALSE
Floatingpointnumbers
FLOAT-singleprecision
DOUBLE-Doubleprecision
Stringtype
STRING-sequenceofcharactersinaspecifiedcharacterset
ComplexTypes
StructsSTRUCT{aINT;bINT}c.astruct_type:STRUCT<col_name:data_type[COMMENTcol_comment],...>
MapsM['group']
ArraysTheelementsinthearrayhavetobeinthesametype['a','b','c'],A[1]retruns'b'.
union:UNIONTYPE<data_type,data_type,...>
SELECTcreate_union(0,key),create_union(if(key<100,0,1),2.0,value),create_union(1,"a",struct(2,"b"))FROMsrcLIMIT2;
union_type
:UNIONTYPE<data_type,data_type,...>
TIMESTAMP
Note:OnlyavailablestartingwithHive0.8.0
BINARY
BINARY(Note:OnlyavailablestartingwithHive0.8.0)
操作符(Builtinoperators):
RelationalOperators
ALIKEB_%,sql中的相同
ARLIKEB:正则表达式like,例如'foo'rlike'f.*'返回true,与AREGEXPB相同
ArithmeticOperators
A%BA&B(按位与)、A|B(按位或)、A^B(按位异或)、~A(按位非)
LogicalOperators
AANDBAORB!ANOTA
OperatorsonComplexTypes
A[n]AisanArrayandnisanint
M[key]MisaMap<K,V>andkeyhastypeK
S.xSisastruct
函数:
round(doublea)BIGINT
floor(doublea)BIGINT
ceil(doublea)BIGINT
rand(),rand(intseed)double
concat(stringA,stringB,...)
substr(stringA,intstart,intlength)
upper(stringA)
lower(stringA)
trim(stringA)
ltrim(stringA)
rtrim(stringA)
regexp_replace(stringA,stringB,stringC)
size(Map<K.V>)returnsthenumberofelementsinthemaptype
size(Array<T>)
cast(<expr>as<type>)和mysql中的相同
from_unixtime(intunixtime)
to_date(stringtimestamp)to_date("1970-01-0100:00:00")="1970-01-01"
year(stringdate)year("1970-01-01")=1970
month(stringdate)month("1970-11-01")=11
day(stringdate)day("1970-11-01")=1,相当于dayofmonth()
hour()/minute()/second()
weekofyear(stringdate)
get_json_object(stringjson_string,stringpath)
aggregatefunctions
count(*),count(expr),count(DISTINCTexpr[,expr_.])
sum(col),sum(DISTINCTcol)
avg(col),avg(DISTINCTcol)
min(col)
max(col)
创建表:
CREATE[EXTERNAL]TABLE[ifnotexists]page_view(viewTimeINTcomment'',useridBIGINT,
page_urlSTRING,referrer_urlSTRING,
friendsARRAY<BIGINT>,propertiesMAP<STRING,STRING>
ipSTRINGCOMMENT'IPAddressoftheUser')
COMMENT'Thisisthepageviewtable'
PARTITIONEDBY(dtSTRINGcomment'',countrySTRING)
CLUSTEREDBY(userid)SORTEDBY(viewTime)INTO32BUCKETS表按userid分为32个桶,每个桶中,数据按viewTime进行排序
ROWFORMATDELIMITED
FIELDSTERMINATEDBY'1'
COLLECTIONITEMSTERMINATEDBY'2'
MAPKEYSTERMINATEDBY'3'
STOREDASSEQUENCEFILE/TEXTFILE/RCFILE/INPUTFORMATinput_format_classnameOUTPUTFORMAToutput_format_classname
[LOCATIONhdfs_path]
[ASselect_statement];
注意:
1.TheEXTERNALkeywordletsyoucreateatableandprovideaLOCATIONsothatHivedoesnotuseadefaultlocationforthistable.WhendroppinganEXTERNALtable,datainthetableisNOTdeletedfromthefilesystem.
2.tablesorpartitionscanbebucketedusingCLUSTEREDBYcolumns,anddatacanbesortedwithinthatbucketviaSORTBYcolumns.Thiscanimproveperformanceoncertainkindsofqueries.
3.TablenamesandcolumnnamesarecaseinsensitivebutSerDeandpropertynamesarecasesensitive.
4.TheCLUSTEREDBYandSORTEDBYcreationcommandsdonotaffecthowdataisinsertedintoatable–onlyhowitisread.Thismeansthatusersmustbecarefultoinsertdatacorrectlybyspecifyingthenumberofreducerstobeequaltothenumberofbuckets,andusingCLUSTERBYandSORTBYcommandsintheirquery.
如何往分桶的表中插入数据:
sethive.enforce.bucketing=true;
FROMuser_id
INSERTOVERWRITETABLEuser_info_bucketed
PARTITION(ds='2009-02-25')
SELECTuserid,firstname,lastnameWHEREds='2009-02-25';
修改表:AlterTable/PartitionStatements
修改分区:
ALTERTABLEtable_nameADD[IFNOTEXISTS]PARTITIONpartition_spec[LOCATION'location1']partition_spec[LOCATION'location2']...,该语法1次不能添加多个分区
partition_spec:(partition_col=partition_col_value,partition_col=partiton_col_value,...)
ALTERTABLEtable_nameDROP[IFEXISTS]partition_spec,partition_spec,...
ALTERTABLEtable_name[PARTITIONpartitionSpec]SETLOCATION"newlocation"
AlterTable/PartitionProtections
ALTERTABLEtable_name[PARTITIONpartition_spec]ENABLE|DISABLENO_DROP;
ALTERTABLEtable_name[PARTITIONpartition_spec]ENABLE|DISABLEOFFLINE;
修改表字段:
ALTERTABLEtable_nameCHANGE[COLUMN]col_old_namecol_new_namecolumn_type[COMMENTcol_comment][FIRST|AFTERcolumn_name]
ALTERTABLEtable_nameADD|REPLACECOLUMNS(col_namedata_type[COMMENTcol_comment],...)
AlterTableProperties(给表中增加自己的元数据)
ALTERTABLEtable_nameSETTBLPROPERTIEStable_properties
AlterTable(Un)Archive
ALTERTABLEtable_name[PARTITIONpartition_spec]ENABLE|DISABLENO_DROP;
ALTERTABLEtable_name[PARTITIONpartition_spec]ENABLE|DISABLEOFFLINE;
ALTERTABLEtable_namePARTITIONpartition_specRENAMETOPARTITIONpartition_spec;
视图:
CREATEVIEW[IFNOTEXISTS]view_name[(column_name[COMMENTcolumn_comment],...)]
[COMMENTview_comment]
[TBLPROPERTIES(property_name=property_value,...)]
ASSELECT...
DROPVIEW[IFEXISTS]view_name
ALTERVIEWview_nameSETTBLPROPERTIEStable_properties
注意:
1.此视图只是逻辑上的,目前不支持物化视图
2.如果视图的基表被删除,视图的schema不会被改变,使用视图时会出错。
3.视图是只读的。
函数:
addfiles...添加jar到hive的classpath
CREATETEMPORARYFUNCTIONfunction_nameASclass_name(使用的类必须包含在classpath中)
DROPTEMPORARYFUNCTION[IFEXISTS]function_name
索引:
CREATEINDEXindex_name
ONTABLEbase_table_name(col_name,...)
ASindex_type
[WITHDEFERREDREBUILD]
[IDXPROPERTIES(property_name=property_value,...)]
[INTABLEindex_table_name]
[
[ROWFORMAT...]STOREDAS...
|STOREDBY...
]
[LOCATIONhdfs_path]
[TBLPROPERTIES(...)]
[COMMENT"indexcomment"]
DROPINDEX[IFEXISTS]index_nameONtable_name
详情见:https://cwiki.apache.org/confluence/display/Hive/IndexDev#CREATE_INDEX
Show/DescribeStatements
showdatabases/tables[like'RegExp'],此处为正则表达式
showpartitionstableName[PARTITION(ds='2010-03-03')]
showtblpropertiestableNameHive0.10.0
SHOWFUNCTIONS"a.*"
SHOW[FORMATTED](INDEX|INDEXES)ONtable_with_index[(FROM|IN)db_name]
SHOWCOLUMNS(FROM|IN)table_name[(FROM|IN)db_name],输出所有列,包括分区列VersioninformationAsofHive0.10
descdatabasexl_netdisk_ods;
加载数据:
1:数据加载到表:
Standardsyntax:
LOADDATA[LOCAL]INPATH'filepath'[OVERWRITE]INTOTABLEtablename[PARTITION(partcol1=val1,partcol2=val2...)],如果不使用overwrite,文件名不冲突的情况下原先数据依然存在,否则将被替换
INSERTOVERWRITETABLEtablename1[PARTITION(partcol1=val1,partcol2=val2...)[IFNOTEXISTS]]select_statement1FROMfrom_statement;
INSERTINTOTABLEtablename1[PARTITION(partcol1=val1,partcol2=val2...)]select_statement1FROMfrom_statement;
Hiveextension(multipleinserts):
FROMfrom_statement
INSERTOVERWRITETABLEtablename1[PARTITION(partcol1=val1,partcol2=val2...)[IFNOTEXISTS]]select_statement1
[INSERTOVERWRITETABLEtablename2[PARTITION...[IFNOTEXISTS]]select_statement2]
[INSERTINTOTABLEtablename2[PARTITION...]select_statement2]...;
Hiveextension(dynamicpartitioninserts):
INSERTOVERWRITETABLEtablenamePARTITION(partcol1[=val1],partcol2[=val2]...)select_statementFROMfrom_statement;
INSERTINTOTABLEtablenamePARTITION(partcol1[=val1],partcol2[=val2]...)select_statementFROMfrom_statement;
注意:
1.INSERTOVERWRITEwilloverwriteanyexistingdatainthetableorpartitionunlessIFNOTEXISTSisprovidedforapartition(asofHive0.9.0)
INSERTINTOwillappendtothetableorpartitionkeepingtheexistingdataintact.(Note:INSERTINTOsyntaxisonlyavailablestartinginversion0.8)
2:数据加载到目录:
Standardsyntax:
INSERTOVERWRITE[LOCAL]DIRECTORYdirectory1SELECT...FROM...
Hiveextension(multipleinserts):
FROMfrom_statement
INSERTOVERWRITE[LOCAL]DIRECTORYdirectory1select_statement1
3.例子:
[INSERTOVERWRITE[LOCAL]DIRECTORYdirectory2select_statement2]...
LOADDATALOCALINPATH/tmp/pv_2008-06-08_us.txtINTOTABLEpage_viewPARTITION(date='2008-06-08',country='US')
LOADDATAINPATH'/user/data/pv_2008-06-08_us.txt'INTOTABLEpage_viewPARTITION(date='2008-06-08',country='US')
INSERTOVERWRITETABLEuser_activeSELECTuser.*FROMuserWHEREuser.active=1;
Inorderchecktheexistenceofakeyinanothertable,theusercanuseLEFTSEMIJOINasillustratedbythefollowingexample.
INSERTOVERWRITETABLEpv_users
SELECTu.*
FROMuseruLEFTSEMIJOINpage_viewpvON(pv.userid=u.id)
WHEREpv.date='2008-03-03';
however,thefollowingqueryisnotallowed
INSERTOVERWRITETABLEpv_gender_agg
SELECTpv_users.gender,count(DISTINCTpv_users.userid),count(DISTINCTpv_users.ip)一个select中只能有一个distinct
FROMpv_users
GROUPBYpv_users.gender;
hive中in,exists子查询的替代:
SELECTa.key,a.valueFROMaWHEREa.keyin(SELECTb.keyFROMB);
可以被重写为:
SELECTa.key,a.valFROMaLEFTSEMIJOINbon(a.key=b.key)
LeftSemijoin(左半连接)当第二个(底端)输入中有匹配行时,LeftSemiJoin逻辑运算符返回第一个(顶端)输入中的每行。如果Argument列内不存在任何联接谓词,则每行都是一个匹配行。
MultiTable/FileInserts
FROMpv_users
INSERTOVERWRITETABLEpv_gender_sum
SELECTpv_users.gender,count_distinct(pv_users.userid)
GROUPBYpv_users.gender
INSERTOVERWRITEDIRECTORY'/user/data/tmp/pv_age_sum'
SELECTpv_users.age,count_distinct(pv_users.userid)
GROUPBYpv_users.age;
FROMpage_view_stgpvs
INSERTOVERWRITETABLEpage_viewPARTITION(dt='2008-06-08',country='US')
SELECTpvs.viewTime,pvs.userid,pvs.page_url,pvs.referrer_url,null,null,pvs.ipWHEREpvs.country='US'
INSERTOVERWRITETABLEpage_viewPARTITION(dt='2008-06-08',country='CA')
SELECTpvs.viewTime,pvs.userid,pvs.page_url,pvs.referrer_url,null,null,pvs.ipWHEREpvs.country='CA'
INSERTOVERWRITETABLEpage_viewPARTITION(dt='2008-06-08',country='UK')
SELECTpvs.viewTime,pvs.userid,pvs.page_url,pvs.referrer_url,null,null,pvs.ipWHEREpvs.country='UK';
Dynamic-partitionInsert(动态分区插入,自动创建分区,解决了需要预先知晓分区的问题,往多个分区插入数据,不需要多个job作业,0.6之后版本的功能)
FROMpage_view_stgpvs
INSERTOVERWRITETABLEpage_viewPARTITION(dt='2008-06-08',country)注意此处没有写明country的值,会自动创建分区并插入值
SELECTpvs.viewTime,pvs.userid,pvs.page_url,pvs.referrer_url,null,null,pvs.ip,pvs.country
动态分区,只能为分区的最后一个列,只能是最后一个子分区,不能是这样(dt,country='US'),动态分区需要在select指定分区列,静态分区不需要。如果被插入的分区已经存在,数据被重写
否则不被重写。如果分区列的值为null或‘’,数据会被插入到默认分区__HIVE_DEFAULT_PARTITION__,此数据被认为是坏数据。
注意:每一个节点(mapperorreducer)创建的分区数不能超过100个,通过DML创建的总的分区数不能超过1000个,文件数不能超过10w个,都可以通过参数配置而改变。hive默认不允许所有的分区
都是动态的,可以通过改变hive.exec.dynamic.partition.mode=nonstrict来改变这种状况
hive>sethive.exec.dynamic.partition.mode=nonstrict;
hive>FROMpage_view_stgpvs
INSERTOVERWRITETABLEpage_viewPARTITION(dt,country)
SELECTpvs.viewTime,pvs.userid,pvs.page_url,pvs.referrer_url,null,null,pvs.ip,
from_unixtimestamp(pvs.viewTime,'yyyy-MM-dd')ds,pvs.country
DISTRIBUTEBYds,country;每个map或reduce产生的分区数超过100,可以将分区均衡,映射到不同的reduce,使用distributeby
ThisquerywillgenerateaMapReducejobratherthanMap-onlyjob.TheSELECT-clausewillbeconvertedtoaplantothemappersandtheoutputwillbedistributedtothereducersbasedonthevalueof(ds,country)pairs.TheINSERT-clausewillbeconvertedtotheplaninthereducerwhichwritestothedynamicpartitions.
查询:
注意:
1.sortby,orderby区别,sortby只保证单个reduce中有序,orderby保证整体有序,整体有序是在牺牲性能的情况下保证,reduce的数量为1
2.sort按照字段的类型进行排序,如果用字符串存储数值,则需要将其转换为非字符串类型之后再排序
3.如果想所有的相同的key在同一个reduce中,使用clusterby..,或使用distributeby..sortby..,前者是后者的缩写形式,一般distributeby后面的列为sortby的前缀
4.分区的使用和mysql一样,自动选择分区。分区的选择,where中和join之后的on中可以使用
连接:
1.多个join转为1个map/reduce作业。Hiveconvertsjoinsovermultipletablesintoasinglemap/reducejobifforeverytablethesamecolumnisusedinthejoinclausese.g.SELECTa.val,b.val,c.valFROMaJOINbON(a.key=b.key1)JOINcON(c.key=b.key1)
2.在join中的每一个map/reduce阶段,缓存前面的表,后面的表的数据依次流入,如果有多个作业,缓存之前的结果,后面的表数据依次流入reduce,如果想要改变缓存的表,给一个hint,
例如:SELECT/*+STREAMTABLE(a)*/a.val,b.val,c.valFROMaJOINbON(a.key=b.key1)JOINcON(c.key=b.key1),本来b表中数据依次流入,改为a表中数据依次流入
3.JoinsoccurBEFOREWHERECLAUSES,注意左、右、外连接的问题,条件放在on中和where中结果不相同,只要知道join和where的顺序,以及外连接的含义,结果很容易理解
先进行on条件筛选,再连接,之后where,on后的筛选条件主要是针对从表,对主表不起作用,因为是外关联,主表数据都会输出,对于主表的筛选条件应该放在where后面,如果
觉得主表不需要关联的数据太多,可以使用子查询,先过滤主表中无用数据
4.只是用一个map完成join,注意有限制。SELECT/*+MAPJOIN(b)*/a.key,a.valueFROMajoinbona.key=b.key,对于a的每一个map,b完全读入内存,只需要map就可以完成join操作。TherestrictionisthataFULL/RIGHTOUTERJOINbcannotbeperformed
如果join的表都很大,但是在join的字段上进行了分桶,而且一个的桶数是另一个的倍数,则也可以进行mapjoin
侧视图(lateralview):用来列转行
lateralView:LATERALVIEWudtf(expression)tableAliasAScolumnAlias(','columnAlias)*
fromClause:FROMbaseTable(lateralView)*
取出样例数据:tablesample
table_sample:TABLESAMPLE(BUCKETxOUTOFy[ONcolname]),将数据分为y桶,取出第x桶,如果建表时表没被分桶,则会扫描全表,进行分桶,colname可以为rand()
block_sample:TABLESAMPLE(nPERCENT)取出n%的数据量,不是%n行
SELECT*FROMsourceTABLESAMPLE(BUCKET3OUTOF32ONrand())s;
SELECT*FROMsourceTABLESAMPLE(0.1PERCENT)s;
虚拟列:
INPUT__FILE__NAME,BLOCK__OFFSET__INSIDE__FILE
查看语句执行过程
explain[extended]query
生成采样数据:
INSERTOVERWRITETABLEpv_gender_sum_sample
SELECTpv_gender_sum.*
FROMpv_gender_sumTABLESAMPLE(BUCKET3OUTOF32[ONuserid]);
Unionall略
ArrayOperations
Arraycolumnsintablescanonlybecreatedprogrammaticallycurrently.
SELECTpv.friends[2]FROMpage_viewspv;SELECTpv.userid,size(pv.friends)FROMpage_viewpv;
Map(AssociativeArrays)Operations
Mapsprovidecollectionssimilartoassociativearrays.Suchstructurescanonlybecreatedprogrammaticallycurrently.
INSERTOVERWRITEpage_views_map
SELECTpv.userid,pv.properties['pagetype']FROMpage_viewspv;SELECTsize(pv.properties)FROMpage_viewpv;
distributeby和clusterby的区别:
DistributeByandSortBy:Insteadofspecifying"clusterby",theusercanspecify"distributeby"and"sortby",sothepartitioncolumnsandsortcolumnscanbedifferent
AlteringTables
ALTERTABLEtab1ADDCOLUMNS(c1INTCOMMENT'anewintcolumn',c2STRINGDEFAULT'defval');
DroppingTablesandPartitions
ALTERTABLEpv_usersDROPPARTITION(ds='2008-08-08')
Hive内置函数:
查看函数:showfunctions;descfunction[extended]fname;
运算符:
A[NOT]BETWEENBANDC(asofversion0.9.0)
函数
1.数学函数:
round(doublea[,intd]):int/double,pow(a,b),sqrt(),bin(a):返回2进制形式,hex():16进制形式,conv(BIGINTnum,intfrom_base,intto_base)
abs(),pmod(inta,intb),返回正余数,a%b如果余数为负,返回负余数,degrees(doublea)弧度转为度,radians(doublea),e(),pi(),sign()符号函数
std(),stddev()
2.CollectionFunctions
size(Map<K.V>),size(Array<T>),map_keys(Map<K.V>),map_values(Map<K.V>),array_contains(Array<T>,value),sort_array(Array<T>):按自然顺序(asofversion0.9.0)
array(n0,n1...)-Createsanarraywiththegivenelements
3.TypeConversionFunctions
cast(expras<type>)
4.DateFunctions
from_unixtime(bigintunixtime[,stringformat])
unix_timestamp(stringdate,stringpattern)
weekofyear(stringdate),
datediff(stringenddate,stringstartdate),
date_add(stringstartdate,intdays),
date_sub(stringstartdate,intdays)
from_utc_timestamp(timestamp,stringtimezone)
to_utc_timestamp(timestamp,stringtimezone)
date_format
5.ConditionalFunctions
if(booleantestCondition,TvalueTrue,TvalueFalseOrNull)
COALESCE(a1,a2,...)-Returnsthefirstnon-nullargument,如果参数全为null,返回null。D.J.[kəʊəles]可以用来替代ifnull,
CASEaWHENbTHENc[WHENdTHENe]*[ELSEf]ENDWhena=b,returnsc;whena=d,returne;elsereturnf
CASEWHENaTHENb[WHENcTHENd]*[ELSEe]ENDWhena=true,returnsb;whenc=true,returnd;elsereturne
6.StringFunctions
concat_ws(stringSEP,stringA,stringB...),可以使用自定义分隔符
find_in_set(stringstr,stringstrList)find_in_set('ab','abc,b,ab,c,def')returns3
format_number(numberx,intd)FormatsthenumberXtoaformatlike'#,###,###.##',roundedtoDdecimalplaces(asofHive0.10.0)
get_json_object(stringjson_string,stringpath),跟对象的名字为$
json_tuple(jsonStr,p1,p2,...,pn)-likeget_json_object,butittakesmultiplenamesandreturnatuple.Alltheinputparametersandoutputcolumntypesarestring.
in_file(stringstr,stringfilename)
instr(stringstr,stringsubstr),locate(stringsubstr,stringstr[,intpos])
lpad(stringstr,intlen,stringpad)lpad('a',3,'b'):bba,rpad(),ltrim(),rtrim(),trim()
ngrams(array<array<string>>,intN,intK,intpf)=================
parse_url(stringurlString,stringpartToExtract[,stringkeyToExtract])ReturnsthespecifiedpartfromtheURL.ValidvaluesforpartToExtractincludeHOST,PATH,QUERY,REF,PROTOCOL,AUTHORITY,FILE,andUSERINFO.
parse_url('http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1','HOST'):'facebook.com',parse_url('http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1','QUERY','k1')returns'v1'.
parse_url_tuple(url,partname1,partname2,...,partnameN)-extractsN(N>=1)partsfromaURL.
SELECTb.*FROMsrcLATERALVIEWparse_url_tuple(fullurl,'HOST','PATH','QUERY','QUERY:id')bashost,path,query,query_idLIMIT1;
SELECTparse_url_tuple(a.fullurl,'HOST','PATH','QUERY','REF','PROTOCOL','FILE','AUTHORITY','USERINFO','QUERY:k1')as(ho,pa,qu,re,pr,fi,au,us,qk1)fromsrca;
printf(Stringformat,Obj...args)(asofHive0.9.0)
regexp_extract('foothebar','foo(.*?)(bar)',n)抽取第n组的数据,例如regexp_extract('foothebar','foo(.*?)(bar)',2)returns'bar.'
regexp_replace(stringINITIAL_STRING,stringPATTERN,stringREPLACEMENT)
repeat(stringstr,intn)Repeatstrntimes
reverse(stringA)
sentences(stringstr,stringlang,stringlocale):array<array<string>>
space(intn)
split(stringstr,stringpat):arraySplitstraroundpat(patisaregularexpression)
str_to_map(text[,delimiter1,delimiter2])Splitstextintokey-valuepairsusingtwodelimiters.Delimiter1separatestextintoK-Vpairs,andDelimiter2splitseachK-Vpair.Defaultdelimitersare','fordelimiter1and'='fordelimiter2.
substr(string|binaryA,intstart,intlen)或substring(string|binaryA,intstart,intlen)
translate(stringinput,stringfrom,stringto)
upper(),lower()
groupconcat()
map_keys()
map_values()
Misc.Functions
variesjava_method(class,method[,arg1[,arg2..]])Synonymforreflect(asofHive0.9.0)
variesreflect(class,method[,arg1[,arg2..]])UsethisUDFtocallJavamethodsbymatchingtheargumentsignature(usesreflection).(asofHive0.7.0)
XPathFunctions(从xml格式中获取数据)
xpath,xpath_short,xpath_int,xpath_long,xpath_float,xpath_double,xpath_number,xpath_string
Built-inAggregateFunctions(UDAF)
max(),min(),count(),avg(),sum()
doublevariance(col),var_pop(col)Returnsthevarianceofanumericcolumninthegroup方差
doublevar_samp(col)Returnstheunbiasedsamplevarianceofanumericcolumninthegroup样本方差
doublestddev_pop(col)Returnsthestandarddeviationofanumericcolumninthegroup标准差
doublestddev_samp(col)Returnstheunbiasedsamplestandarddeviationofanumericcolumninthegroup样本标准差
doublecovar_pop(col1,col2)Returnsthepopulationcovarianceofapairofnumericcolumnsinthegroup协方差
doublecovar_samp(col1,col2)Returnsthesamplecovarianceofapairofanumericcolumnsinthegroup样本协方差
Built-inTable-GeneratingFunctions(UDTF)
ArrayTypeexplode(array<TYPE>a)Foreachelementina,explode()generatesarowcontainingthatelement
NootherexpressionsareallowedinSELECT
SELECTpageid,explode(adid_list)ASmyCol...isnotsupported
UDTF'scan'tbenested
SELECTexplode(explode(adid_list))ASmyCol...isnotsupported
GROUPBY/CLUSTERBY/DISTRIBUTEBY/SORTBYisnotsupported
SELECTexplode(adid_list)ASmyCol...GROUPBYmyColisnotsupported
stack(INTn,v_1,v_2,...,v_k)Breaksupv_1,...,v_kintonrows.Eachrowwillhavek/ncolumns.nmustbeconstant.