hive初步了解

1、下载hive安装包

2、上传至linux服务器

3、拷贝安装包至hadoop用户主目录下

4、修改安装包所属组、所属用户为hadoop这里用什么命令呢?(chownhadoopapache-hive-0.13.1-bin、chgrphadoopapache-hive-0.13.1-bin)

5、添加hive-site.xml文件,这里一般拷贝hive-default.xml.template即可cphive-default.xml.templatehive-site.xml

6、添加hive环境变量,配置HIVE_HOME至.bashrc中source.bashrc

/home/hadoop/hive_testdata

CREATETABLEtest_1(idINT,nameSTRING,citySTRING)ROWFORMATDELIMITEDFIELDSTERMINATEDBY'\t'

loaddatalocalinpath'/home/hadoop/hive_testdata'overwriteintotabletest_1

执行mapreduce

hive支持像mysql一样的sql,但是hive一般只有查询和插入(load),没有更新,在执行select*的时候直接把hdfs中的数据输出,不会执行mapreduce,当执行其他的时候就会先去hadoop集群上面执行mapreduce,然后将结果展现出来

hadoop分为两部分:存储(hdfs),计算(mapreduce)

首先,hive的源数据一定是存储在hdfs上面的。

例如:

[hadoop@localhost~]$hadoopfs-ls/user/hive

Found1items

drwxr-xr-x-hadoopsupergroup02015-01-0821:04/user/hive/warehouse

[hadoop@localhost~]$hadoopfs-ls/user/hive/warehouse

Found1items

drwxr-xr-x-hadoopsupergroup02015-01-0821:04/user/hive/warehouse/test_1

[hadoop@localhost~]$hadoopfs-ls/user/hive/warehouse/test_1/

Found1items

-rw-r--r--1hadoopsupergroup212015-01-0821:04/user/hive/warehouse/test_1/hive_testdata

[hadoop@localhost~]$hadoopfs-ls/user/hive/warehouse/test_1/hive_testdata/

Found1items

-rw-r--r--1hadoopsupergroup212015-01-0821:04/user/hive/warehouse/test_1/hive_testdata

[hadoop@localhost~]$hadoopfs-cat/user/hive/warehouse/test_1/hive_testdata/

1aa1

2bb1

3cc1

那么在查询的时候,如果是select*,hive做的就比较简单,将hdfs中的数据合并并简单处理输出即可,不会走mapreduce

如果是其他相对复杂的查询,那么hive会将查询语句分解为mapreduce任务,然后将任务发送给集群,由mapreduce的jobtracker节点负责分发任务给tasktracker===

本来mapreduce也是集群方式,他主要分为jobtracker和tasktracker两类,jobtracker负责任务分发,tasktracker负责处理执行所分配到的任务

hive>select*fromtest_1;

OK

1aa1

2bb1

3cc1

Timetaken:0.781seconds,Fetched:3row(s)

hive>selectidfromtest_1groupbyid;

Totaljobs=1

LaunchingJob1outof1

Numberofreducetasksnotspecified.Estimatedfrominputdatasize:1

Inordertochangetheaverageloadforareducer(inbytes):

sethive.exec.reducers.bytes.per.reducer=

Inordertolimitthemaximumnumberofreducers:

sethive.exec.reducers.max=

Inordertosetaconstantnumberofreducers:

setmapred.reduce.tasks=

StartingJob=job_201412251631_0001,TrackingURL=http://localhost:50030/jobdetails.jsp?jobid=job_201412251631_0001

KillCommand=/home/hadoop/hadoop-1.2.1/libexec/../bin/hadoopjob-killjob_201412251631_0001

HadoopjobinformationforStage-1:numberofmappers:1;numberofreducers:1

2015-01-0821:09:01,496Stage-1map=0%,reduce=0%

2015-01-0821:09:03,531Stage-1map=100%,reduce=0%,CumulativeCPU1.48sec

2015-01-0821:09:11,625Stage-1map=100%,reduce=33%,CumulativeCPU1.48sec

2015-01-0821:09:12,639Stage-1map=100%,reduce=100%,CumulativeCPU4.17sec

MapReduceTotalcumulativeCPUtime:4seconds170msec

EndedJob=job_201412251631_0001

MapReduceJobsLaunched:

Job0:Map:1Reduce:1CumulativeCPU:4.17secHDFSRead:237HDFSWrite:6SUCCESS

TotalMapReduceCPUTimeSpent:4seconds170msec

OK

1

2

3

Timetaken:21.549seconds,Fetched:3row(s)

相关推荐