hive 全面学习

1.HiveCLI(hive命令行commandline)

hive命令行选项:

-dk=v(定义变量)-e""-ffilename-hhost-pport-v(控制台显示执行的hql)

hive交互模式:

set;显示hive中的所有变量,例如setmapred.reduce.tasks=32;

setk=v:如果k不存在,不会报错

!shellcommand:hive交互模式执行shell,例如:!echoaa

dfscommand:hive交互模式执行hadoopfs的命令,和hadoopfs命令相同

set:输出hive设置的变量

数据类型:

Integers

TINYINT-1byteinteger

SMALLINT-2byteinteger

INT-4byteinteger

BIGINT-8byteinteger

Booleantype

BOOLEAN-TRUE/FALSE

Floatingpointnumbers

FLOAT-singleprecision

DOUBLE-Doubleprecision

Stringtype

STRING-sequenceofcharactersinaspecifiedcharacterset

ComplexTypes

StructsSTRUCT{aINT;bINT}c.astruct_type:STRUCT<col_name:data_type[COMMENTcol_comment],...>

MapsM['group']

ArraysTheelementsinthearrayhavetobeinthesametype['a','b','c'],A[1]retruns'b'.

union:UNIONTYPE<data_type,data_type,...>

SELECTcreate_union(0,key),create_union(if(key<100,0,1),2.0,value),create_union(1,"a",struct(2,"b"))FROMsrcLIMIT2;

union_type

:UNIONTYPE<data_type,data_type,...>

TIMESTAMP

Note:OnlyavailablestartingwithHive0.8.0

BINARY

BINARY(Note:OnlyavailablestartingwithHive0.8.0)

操作符(Builtinoperators):

RelationalOperators

ALIKEB_%,sql中的相同

ARLIKEB:正则表达式like,例如'foo'rlike'f.*'返回true,与AREGEXPB相同

ArithmeticOperators

A%BA&B(按位与)、A|B(按位或)、A^B(按位异或)、~A(按位非)

LogicalOperators

AANDBAORB!ANOTA

OperatorsonComplexTypes

A[n]AisanArrayandnisanint

M[key]MisaMap<K,V>andkeyhastypeK

S.xSisastruct

函数:

round(doublea)BIGINT

floor(doublea)BIGINT

ceil(doublea)BIGINT

rand(),rand(intseed)double

concat(stringA,stringB,...)

substr(stringA,intstart,intlength)

upper(stringA)

lower(stringA)

trim(stringA)

ltrim(stringA)

rtrim(stringA)

regexp_replace(stringA,stringB,stringC)

size(Map<K.V>)returnsthenumberofelementsinthemaptype

size(Array<T>)

cast(<expr>as<type>)和mysql中的相同

from_unixtime(intunixtime)

to_date(stringtimestamp)to_date("1970-01-0100:00:00")="1970-01-01"

year(stringdate)year("1970-01-01")=1970

month(stringdate)month("1970-11-01")=11

day(stringdate)day("1970-11-01")=1,相当于dayofmonth()

hour()/minute()/second()

weekofyear(stringdate)

get_json_object(stringjson_string,stringpath)

aggregatefunctions

count(*),count(expr),count(DISTINCTexpr[,expr_.])

sum(col),sum(DISTINCTcol)

avg(col),avg(DISTINCTcol)

min(col)

max(col)

创建表:

CREATE[EXTERNAL]TABLE[ifnotexists]page_view(viewTimeINTcomment'',useridBIGINT,

page_urlSTRING,referrer_urlSTRING,

friendsARRAY<BIGINT>,propertiesMAP<STRING,STRING>

ipSTRINGCOMMENT'IPAddressoftheUser')

COMMENT'Thisisthepageviewtable'

PARTITIONEDBY(dtSTRINGcomment'',countrySTRING)

CLUSTEREDBY(userid)SORTEDBY(viewTime)INTO32BUCKETS表按userid分为32个桶,每个桶中,数据按viewTime进行排序

ROWFORMATDELIMITED

FIELDSTERMINATEDBY'1'

COLLECTIONITEMSTERMINATEDBY'2'

MAPKEYSTERMINATEDBY'3'

STOREDASSEQUENCEFILE/TEXTFILE/RCFILE/INPUTFORMATinput_format_classnameOUTPUTFORMAToutput_format_classname

[LOCATIONhdfs_path]

[ASselect_statement];

注意:

1.TheEXTERNALkeywordletsyoucreateatableandprovideaLOCATIONsothatHivedoesnotuseadefaultlocationforthistable.WhendroppinganEXTERNALtable,datainthetableisNOTdeletedfromthefilesystem.

2.tablesorpartitionscanbebucketedusingCLUSTEREDBYcolumns,anddatacanbesortedwithinthatbucketviaSORTBYcolumns.Thiscanimproveperformanceoncertainkindsofqueries.

3.TablenamesandcolumnnamesarecaseinsensitivebutSerDeandpropertynamesarecasesensitive.

4.TheCLUSTEREDBYandSORTEDBYcreationcommandsdonotaffecthowdataisinsertedintoatable–onlyhowitisread.Thismeansthatusersmustbecarefultoinsertdatacorrectlybyspecifyingthenumberofreducerstobeequaltothenumberofbuckets,andusingCLUSTERBYandSORTBYcommandsintheirquery.

如何往分桶的表中插入数据:

sethive.enforce.bucketing=true;

FROMuser_id

INSERTOVERWRITETABLEuser_info_bucketed

PARTITION(ds='2009-02-25')

SELECTuserid,firstname,lastnameWHEREds='2009-02-25';

修改表:AlterTable/PartitionStatements

修改分区:

ALTERTABLEtable_nameADD[IFNOTEXISTS]PARTITIONpartition_spec[LOCATION'location1']partition_spec[LOCATION'location2']...,该语法1次不能添加多个分区

partition_spec:(partition_col=partition_col_value,partition_col=partiton_col_value,...)

ALTERTABLEtable_nameDROP[IFEXISTS]partition_spec,partition_spec,...

ALTERTABLEtable_name[PARTITIONpartitionSpec]SETLOCATION"newlocation"

AlterTable/PartitionProtections

ALTERTABLEtable_name[PARTITIONpartition_spec]ENABLE|DISABLENO_DROP;

ALTERTABLEtable_name[PARTITIONpartition_spec]ENABLE|DISABLEOFFLINE;

修改表字段:

ALTERTABLEtable_nameCHANGE[COLUMN]col_old_namecol_new_namecolumn_type[COMMENTcol_comment][FIRST|AFTERcolumn_name]

ALTERTABLEtable_nameADD|REPLACECOLUMNS(col_namedata_type[COMMENTcol_comment],...)

AlterTableProperties(给表中增加自己的元数据)

ALTERTABLEtable_nameSETTBLPROPERTIEStable_properties

AlterTable(Un)Archive

ALTERTABLEtable_name[PARTITIONpartition_spec]ENABLE|DISABLENO_DROP;

ALTERTABLEtable_name[PARTITIONpartition_spec]ENABLE|DISABLEOFFLINE;

ALTERTABLEtable_namePARTITIONpartition_specRENAMETOPARTITIONpartition_spec;

视图:

CREATEVIEW[IFNOTEXISTS]view_name[(column_name[COMMENTcolumn_comment],...)]

[COMMENTview_comment]

[TBLPROPERTIES(property_name=property_value,...)]

ASSELECT...

DROPVIEW[IFEXISTS]view_name

ALTERVIEWview_nameSETTBLPROPERTIEStable_properties

注意:

1.此视图只是逻辑上的,目前不支持物化视图

2.如果视图的基表被删除,视图的schema不会被改变,使用视图时会出错。

3.视图是只读的。

函数:

addfiles...添加jar到hive的classpath

CREATETEMPORARYFUNCTIONfunction_nameASclass_name(使用的类必须包含在classpath中)

DROPTEMPORARYFUNCTION[IFEXISTS]function_name

索引:

CREATEINDEXindex_name

ONTABLEbase_table_name(col_name,...)

ASindex_type

[WITHDEFERREDREBUILD]

[IDXPROPERTIES(property_name=property_value,...)]

[INTABLEindex_table_name]

[

[ROWFORMAT...]STOREDAS...

|STOREDBY...

]

[LOCATIONhdfs_path]

[TBLPROPERTIES(...)]

[COMMENT"indexcomment"]

DROPINDEX[IFEXISTS]index_nameONtable_name

详情见:https://cwiki.apache.org/confluence/display/Hive/IndexDev#CREATE_INDEX

Show/DescribeStatements

showdatabases/tables[like'RegExp'],此处为正则表达式

showpartitionstableName[PARTITION(ds='2010-03-03')]

showtblpropertiestableNameHive0.10.0

SHOWFUNCTIONS"a.*"

SHOW[FORMATTED](INDEX|INDEXES)ONtable_with_index[(FROM|IN)db_name]

SHOWCOLUMNS(FROM|IN)table_name[(FROM|IN)db_name],输出所有列,包括分区列VersioninformationAsofHive0.10

descdatabasexl_netdisk_ods;

加载数据:

1:数据加载到表:

Standardsyntax:

LOADDATA[LOCAL]INPATH'filepath'[OVERWRITE]INTOTABLEtablename[PARTITION(partcol1=val1,partcol2=val2...)],如果不使用overwrite,文件名不冲突的情况下原先数据依然存在,否则将被替换

INSERTOVERWRITETABLEtablename1[PARTITION(partcol1=val1,partcol2=val2...)[IFNOTEXISTS]]select_statement1FROMfrom_statement;

INSERTINTOTABLEtablename1[PARTITION(partcol1=val1,partcol2=val2...)]select_statement1FROMfrom_statement;

Hiveextension(multipleinserts):

FROMfrom_statement

INSERTOVERWRITETABLEtablename1[PARTITION(partcol1=val1,partcol2=val2...)[IFNOTEXISTS]]select_statement1

[INSERTOVERWRITETABLEtablename2[PARTITION...[IFNOTEXISTS]]select_statement2]

[INSERTINTOTABLEtablename2[PARTITION...]select_statement2]...;

Hiveextension(dynamicpartitioninserts):

INSERTOVERWRITETABLEtablenamePARTITION(partcol1[=val1],partcol2[=val2]...)select_statementFROMfrom_statement;

INSERTINTOTABLEtablenamePARTITION(partcol1[=val1],partcol2[=val2]...)select_statementFROMfrom_statement;

注意:

1.INSERTOVERWRITEwilloverwriteanyexistingdatainthetableorpartitionunlessIFNOTEXISTSisprovidedforapartition(asofHive0.9.0)

INSERTINTOwillappendtothetableorpartitionkeepingtheexistingdataintact.(Note:INSERTINTOsyntaxisonlyavailablestartinginversion0.8)

2:数据加载到目录:

Standardsyntax:

INSERTOVERWRITE[LOCAL]DIRECTORYdirectory1SELECT...FROM...

Hiveextension(multipleinserts):

FROMfrom_statement

INSERTOVERWRITE[LOCAL]DIRECTORYdirectory1select_statement1

3.例子:

[INSERTOVERWRITE[LOCAL]DIRECTORYdirectory2select_statement2]...

LOADDATALOCALINPATH/tmp/pv_2008-06-08_us.txtINTOTABLEpage_viewPARTITION(date='2008-06-08',country='US')

LOADDATAINPATH'/user/data/pv_2008-06-08_us.txt'INTOTABLEpage_viewPARTITION(date='2008-06-08',country='US')

INSERTOVERWRITETABLEuser_activeSELECTuser.*FROMuserWHEREuser.active=1;

Inorderchecktheexistenceofakeyinanothertable,theusercanuseLEFTSEMIJOINasillustratedbythefollowingexample.

INSERTOVERWRITETABLEpv_users

SELECTu.*

FROMuseruLEFTSEMIJOINpage_viewpvON(pv.userid=u.id)

WHEREpv.date='2008-03-03';

however,thefollowingqueryisnotallowed

INSERTOVERWRITETABLEpv_gender_agg

SELECTpv_users.gender,count(DISTINCTpv_users.userid),count(DISTINCTpv_users.ip)一个select中只能有一个distinct

FROMpv_users

GROUPBYpv_users.gender;

hive中in,exists子查询的替代:

SELECTa.key,a.valueFROMaWHEREa.keyin(SELECTb.keyFROMB);

可以被重写为:

SELECTa.key,a.valFROMaLEFTSEMIJOINbon(a.key=b.key)

LeftSemijoin(左半连接)当第二个(底端)输入中有匹配行时,LeftSemiJoin逻辑运算符返回第一个(顶端)输入中的每行。如果Argument列内不存在任何联接谓词,则每行都是一个匹配行。

MultiTable/FileInserts

FROMpv_users

INSERTOVERWRITETABLEpv_gender_sum

SELECTpv_users.gender,count_distinct(pv_users.userid)

GROUPBYpv_users.gender

INSERTOVERWRITEDIRECTORY'/user/data/tmp/pv_age_sum'

SELECTpv_users.age,count_distinct(pv_users.userid)

GROUPBYpv_users.age;

FROMpage_view_stgpvs

INSERTOVERWRITETABLEpage_viewPARTITION(dt='2008-06-08',country='US')

SELECTpvs.viewTime,pvs.userid,pvs.page_url,pvs.referrer_url,null,null,pvs.ipWHEREpvs.country='US'

INSERTOVERWRITETABLEpage_viewPARTITION(dt='2008-06-08',country='CA')

SELECTpvs.viewTime,pvs.userid,pvs.page_url,pvs.referrer_url,null,null,pvs.ipWHEREpvs.country='CA'

INSERTOVERWRITETABLEpage_viewPARTITION(dt='2008-06-08',country='UK')

SELECTpvs.viewTime,pvs.userid,pvs.page_url,pvs.referrer_url,null,null,pvs.ipWHEREpvs.country='UK';

Dynamic-partitionInsert(动态分区插入,自动创建分区,解决了需要预先知晓分区的问题,往多个分区插入数据,不需要多个job作业,0.6之后版本的功能)

FROMpage_view_stgpvs

INSERTOVERWRITETABLEpage_viewPARTITION(dt='2008-06-08',country)注意此处没有写明country的值,会自动创建分区并插入值

SELECTpvs.viewTime,pvs.userid,pvs.page_url,pvs.referrer_url,null,null,pvs.ip,pvs.country

动态分区,只能为分区的最后一个列,只能是最后一个子分区,不能是这样(dt,country='US'),动态分区需要在select指定分区列,静态分区不需要。如果被插入的分区已经存在,数据被重写

否则不被重写。如果分区列的值为null或‘’,数据会被插入到默认分区__HIVE_DEFAULT_PARTITION__,此数据被认为是坏数据。

注意:每一个节点(mapperorreducer)创建的分区数不能超过100个,通过DML创建的总的分区数不能超过1000个,文件数不能超过10w个,都可以通过参数配置而改变。hive默认不允许所有的分区

都是动态的,可以通过改变hive.exec.dynamic.partition.mode=nonstrict来改变这种状况

hive>sethive.exec.dynamic.partition.mode=nonstrict;

hive>FROMpage_view_stgpvs

INSERTOVERWRITETABLEpage_viewPARTITION(dt,country)

SELECTpvs.viewTime,pvs.userid,pvs.page_url,pvs.referrer_url,null,null,pvs.ip,

from_unixtimestamp(pvs.viewTime,'yyyy-MM-dd')ds,pvs.country

DISTRIBUTEBYds,country;每个map或reduce产生的分区数超过100,可以将分区均衡,映射到不同的reduce,使用distributeby

ThisquerywillgenerateaMapReducejobratherthanMap-onlyjob.TheSELECT-clausewillbeconvertedtoaplantothemappersandtheoutputwillbedistributedtothereducersbasedonthevalueof(ds,country)pairs.TheINSERT-clausewillbeconvertedtotheplaninthereducerwhichwritestothedynamicpartitions.

查询:

注意:

1.sortby,orderby区别,sortby只保证单个reduce中有序,orderby保证整体有序,整体有序是在牺牲性能的情况下保证,reduce的数量为1

2.sort按照字段的类型进行排序,如果用字符串存储数值,则需要将其转换为非字符串类型之后再排序

3.如果想所有的相同的key在同一个reduce中,使用clusterby..,或使用distributeby..sortby..,前者是后者的缩写形式,一般distributeby后面的列为sortby的前缀

4.分区的使用和mysql一样,自动选择分区。分区的选择,where中和join之后的on中可以使用

连接:

1.多个join转为1个map/reduce作业。Hiveconvertsjoinsovermultipletablesintoasinglemap/reducejobifforeverytablethesamecolumnisusedinthejoinclausese.g.SELECTa.val,b.val,c.valFROMaJOINbON(a.key=b.key1)JOINcON(c.key=b.key1)

2.在join中的每一个map/reduce阶段,缓存前面的表,后面的表的数据依次流入,如果有多个作业,缓存之前的结果,后面的表数据依次流入reduce,如果想要改变缓存的表,给一个hint,

例如:SELECT/*+STREAMTABLE(a)*/a.val,b.val,c.valFROMaJOINbON(a.key=b.key1)JOINcON(c.key=b.key1),本来b表中数据依次流入,改为a表中数据依次流入

3.JoinsoccurBEFOREWHERECLAUSES,注意左、右、外连接的问题,条件放在on中和where中结果不相同,只要知道join和where的顺序,以及外连接的含义,结果很容易理解

先进行on条件筛选,再连接,之后where,on后的筛选条件主要是针对从表,对主表不起作用,因为是外关联,主表数据都会输出,对于主表的筛选条件应该放在where后面,如果

觉得主表不需要关联的数据太多,可以使用子查询,先过滤主表中无用数据

4.只是用一个map完成join,注意有限制。SELECT/*+MAPJOIN(b)*/a.key,a.valueFROMajoinbona.key=b.key,对于a的每一个map,b完全读入内存,只需要map就可以完成join操作。TherestrictionisthataFULL/RIGHTOUTERJOINbcannotbeperformed

如果join的表都很大,但是在join的字段上进行了分桶,而且一个的桶数是另一个的倍数,则也可以进行mapjoin

侧视图(lateralview):用来列转行

lateralView:LATERALVIEWudtf(expression)tableAliasAScolumnAlias(','columnAlias)*

fromClause:FROMbaseTable(lateralView)*

取出样例数据:tablesample

table_sample:TABLESAMPLE(BUCKETxOUTOFy[ONcolname]),将数据分为y桶,取出第x桶,如果建表时表没被分桶,则会扫描全表,进行分桶,colname可以为rand()

block_sample:TABLESAMPLE(nPERCENT)取出n%的数据量,不是%n行

SELECT*FROMsourceTABLESAMPLE(BUCKET3OUTOF32ONrand())s;

SELECT*FROMsourceTABLESAMPLE(0.1PERCENT)s;

虚拟列:

INPUT__FILE__NAME,BLOCK__OFFSET__INSIDE__FILE

查看语句执行过程

explain[extended]query

生成采样数据:

INSERTOVERWRITETABLEpv_gender_sum_sample

SELECTpv_gender_sum.*

FROMpv_gender_sumTABLESAMPLE(BUCKET3OUTOF32[ONuserid]);

Unionall略

ArrayOperations

Arraycolumnsintablescanonlybecreatedprogrammaticallycurrently.

SELECTpv.friends[2]FROMpage_viewspv;SELECTpv.userid,size(pv.friends)FROMpage_viewpv;

Map(AssociativeArrays)Operations

Mapsprovidecollectionssimilartoassociativearrays.Suchstructurescanonlybecreatedprogrammaticallycurrently.

INSERTOVERWRITEpage_views_map

SELECTpv.userid,pv.properties['pagetype']FROMpage_viewspv;SELECTsize(pv.properties)FROMpage_viewpv;

distributeby和clusterby的区别:

DistributeByandSortBy:Insteadofspecifying"clusterby",theusercanspecify"distributeby"and"sortby",sothepartitioncolumnsandsortcolumnscanbedifferent

AlteringTables

ALTERTABLEtab1ADDCOLUMNS(c1INTCOMMENT'anewintcolumn',c2STRINGDEFAULT'defval');

DroppingTablesandPartitions

ALTERTABLEpv_usersDROPPARTITION(ds='2008-08-08')

Hive内置函数:

查看函数:showfunctions;descfunction[extended]fname;

运算符:

A[NOT]BETWEENBANDC(asofversion0.9.0)

函数

1.数学函数:

round(doublea[,intd]):int/double,pow(a,b),sqrt(),bin(a):返回2进制形式,hex():16进制形式,conv(BIGINTnum,intfrom_base,intto_base)

abs(),pmod(inta,intb),返回正余数,a%b如果余数为负,返回负余数,degrees(doublea)弧度转为度,radians(doublea),e(),pi(),sign()符号函数

std(),stddev()

2.CollectionFunctions

size(Map<K.V>),size(Array<T>),map_keys(Map<K.V>),map_values(Map<K.V>),array_contains(Array<T>,value),sort_array(Array<T>):按自然顺序(asofversion0.9.0)

array(n0,n1...)-Createsanarraywiththegivenelements

3.TypeConversionFunctions

cast(expras<type>)

4.DateFunctions

from_unixtime(bigintunixtime[,stringformat])

unix_timestamp(stringdate,stringpattern)

weekofyear(stringdate),

datediff(stringenddate,stringstartdate),

date_add(stringstartdate,intdays),

date_sub(stringstartdate,intdays)

from_utc_timestamp(timestamp,stringtimezone)

to_utc_timestamp(timestamp,stringtimezone)

date_format

5.ConditionalFunctions

if(booleantestCondition,TvalueTrue,TvalueFalseOrNull)

COALESCE(a1,a2,...)-Returnsthefirstnon-nullargument,如果参数全为null,返回null。D.J.[kəʊəles]可以用来替代ifnull,

CASEaWHENbTHENc[WHENdTHENe]*[ELSEf]ENDWhena=b,returnsc;whena=d,returne;elsereturnf

CASEWHENaTHENb[WHENcTHENd]*[ELSEe]ENDWhena=true,returnsb;whenc=true,returnd;elsereturne

6.StringFunctions

concat_ws(stringSEP,stringA,stringB...),可以使用自定义分隔符

find_in_set(stringstr,stringstrList)find_in_set('ab','abc,b,ab,c,def')returns3

format_number(numberx,intd)FormatsthenumberXtoaformatlike'#,###,###.##',roundedtoDdecimalplaces(asofHive0.10.0)

get_json_object(stringjson_string,stringpath),跟对象的名字为$

json_tuple(jsonStr,p1,p2,...,pn)-likeget_json_object,butittakesmultiplenamesandreturnatuple.Alltheinputparametersandoutputcolumntypesarestring.

in_file(stringstr,stringfilename)

instr(stringstr,stringsubstr),locate(stringsubstr,stringstr[,intpos])

lpad(stringstr,intlen,stringpad)lpad('a',3,'b'):bba,rpad(),ltrim(),rtrim(),trim()

ngrams(array<array<string>>,intN,intK,intpf)=================

parse_url(stringurlString,stringpartToExtract[,stringkeyToExtract])ReturnsthespecifiedpartfromtheURL.ValidvaluesforpartToExtractincludeHOST,PATH,QUERY,REF,PROTOCOL,AUTHORITY,FILE,andUSERINFO.

parse_url('http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1','HOST'):'facebook.com',parse_url('http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1','QUERY','k1')returns'v1'.

parse_url_tuple(url,partname1,partname2,...,partnameN)-extractsN(N>=1)partsfromaURL.

SELECTb.*FROMsrcLATERALVIEWparse_url_tuple(fullurl,'HOST','PATH','QUERY','QUERY:id')bashost,path,query,query_idLIMIT1;

SELECTparse_url_tuple(a.fullurl,'HOST','PATH','QUERY','REF','PROTOCOL','FILE','AUTHORITY','USERINFO','QUERY:k1')as(ho,pa,qu,re,pr,fi,au,us,qk1)fromsrca;

printf(Stringformat,Obj...args)(asofHive0.9.0)

regexp_extract('foothebar','foo(.*?)(bar)',n)抽取第n组的数据,例如regexp_extract('foothebar','foo(.*?)(bar)',2)returns'bar.'

regexp_replace(stringINITIAL_STRING,stringPATTERN,stringREPLACEMENT)

repeat(stringstr,intn)Repeatstrntimes

reverse(stringA)

sentences(stringstr,stringlang,stringlocale):array<array<string>>

space(intn)

split(stringstr,stringpat):arraySplitstraroundpat(patisaregularexpression)

str_to_map(text[,delimiter1,delimiter2])Splitstextintokey-valuepairsusingtwodelimiters.Delimiter1separatestextintoK-Vpairs,andDelimiter2splitseachK-Vpair.Defaultdelimitersare','fordelimiter1and'='fordelimiter2.

substr(string|binaryA,intstart,intlen)或substring(string|binaryA,intstart,intlen)

translate(stringinput,stringfrom,stringto)

upper(),lower()

groupconcat()

map_keys()

map_values()

Misc.Functions

variesjava_method(class,method[,arg1[,arg2..]])Synonymforreflect(asofHive0.9.0)

variesreflect(class,method[,arg1[,arg2..]])UsethisUDFtocallJavamethodsbymatchingtheargumentsignature(usesreflection).(asofHive0.7.0)

XPathFunctions(从xml格式中获取数据)

xpath,xpath_short,xpath_int,xpath_long,xpath_float,xpath_double,xpath_number,xpath_string

Built-inAggregateFunctions(UDAF)

max(),min(),count(),avg(),sum()

doublevariance(col),var_pop(col)Returnsthevarianceofanumericcolumninthegroup方差

doublevar_samp(col)Returnstheunbiasedsamplevarianceofanumericcolumninthegroup样本方差

doublestddev_pop(col)Returnsthestandarddeviationofanumericcolumninthegroup标准差

doublestddev_samp(col)Returnstheunbiasedsamplestandarddeviationofanumericcolumninthegroup样本标准差

doublecovar_pop(col1,col2)Returnsthepopulationcovarianceofapairofnumericcolumnsinthegroup协方差

doublecovar_samp(col1,col2)Returnsthesamplecovarianceofapairofanumericcolumnsinthegroup样本协方差

Built-inTable-GeneratingFunctions(UDTF)

ArrayTypeexplode(array<TYPE>a)Foreachelementina,explode()generatesarowcontainingthatelement

NootherexpressionsareallowedinSELECT

SELECTpageid,explode(adid_list)ASmyCol...isnotsupported

UDTF'scan'tbenested

SELECTexplode(explode(adid_list))ASmyCol...isnotsupported

GROUPBY/CLUSTERBY/DISTRIBUTEBY/SORTBYisnotsupported

SELECTexplode(adid_list)ASmyCol...GROUPBYmyColisnotsupported

stack(INTn,v_1,v_2,...,v_k)Breaksupv_1,...,v_kintonrows.Eachrowwillhavek/ncolumns.nmustbeconstant.

相关推荐