hive初步了解
1、下载hive安装包
2、上传至linux服务器
3、拷贝安装包至hadoop用户主目录下
4、修改安装包所属组、所属用户为hadoop这里用什么命令呢?(chownhadoopapache-hive-0.13.1-bin、chgrphadoopapache-hive-0.13.1-bin)
5、添加hive-site.xml文件,这里一般拷贝hive-default.xml.template即可cphive-default.xml.templatehive-site.xml
6、添加hive环境变量,配置HIVE_HOME至.bashrc中source.bashrc
/home/hadoop/hive_testdata
CREATETABLEtest_1(idINT,nameSTRING,citySTRING)ROWFORMATDELIMITEDFIELDSTERMINATEDBY'\t'
loaddatalocalinpath'/home/hadoop/hive_testdata'overwriteintotabletest_1
执行mapreduce
hive支持像mysql一样的sql,但是hive一般只有查询和插入(load),没有更新,在执行select*的时候直接把hdfs中的数据输出,不会执行mapreduce,当执行其他的时候就会先去hadoop集群上面执行mapreduce,然后将结果展现出来
hadoop分为两部分:存储(hdfs),计算(mapreduce)
首先,hive的源数据一定是存储在hdfs上面的。
例如:
[hadoop@localhost~]$hadoopfs-ls/user/hive
Found1items
drwxr-xr-x-hadoopsupergroup02015-01-0821:04/user/hive/warehouse
[hadoop@localhost~]$hadoopfs-ls/user/hive/warehouse
Found1items
drwxr-xr-x-hadoopsupergroup02015-01-0821:04/user/hive/warehouse/test_1
[hadoop@localhost~]$hadoopfs-ls/user/hive/warehouse/test_1/
Found1items
-rw-r--r--1hadoopsupergroup212015-01-0821:04/user/hive/warehouse/test_1/hive_testdata
[hadoop@localhost~]$hadoopfs-ls/user/hive/warehouse/test_1/hive_testdata/
Found1items
-rw-r--r--1hadoopsupergroup212015-01-0821:04/user/hive/warehouse/test_1/hive_testdata
[hadoop@localhost~]$hadoopfs-cat/user/hive/warehouse/test_1/hive_testdata/
1aa1
2bb1
3cc1
那么在查询的时候,如果是select*,hive做的就比较简单,将hdfs中的数据合并并简单处理输出即可,不会走mapreduce
如果是其他相对复杂的查询,那么hive会将查询语句分解为mapreduce任务,然后将任务发送给集群,由mapreduce的jobtracker节点负责分发任务给tasktracker===
本来mapreduce也是集群方式,他主要分为jobtracker和tasktracker两类,jobtracker负责任务分发,tasktracker负责处理执行所分配到的任务
hive>select*fromtest_1;
OK
1aa1
2bb1
3cc1
Timetaken:0.781seconds,Fetched:3row(s)
hive>selectidfromtest_1groupbyid;
Totaljobs=1
LaunchingJob1outof1
Numberofreducetasksnotspecified.Estimatedfrominputdatasize:1
Inordertochangetheaverageloadforareducer(inbytes):
sethive.exec.reducers.bytes.per.reducer=
Inordertolimitthemaximumnumberofreducers:
sethive.exec.reducers.max=
Inordertosetaconstantnumberofreducers:
setmapred.reduce.tasks=
StartingJob=job_201412251631_0001,TrackingURL=http://localhost:50030/jobdetails.jsp?jobid=job_201412251631_0001
KillCommand=/home/hadoop/hadoop-1.2.1/libexec/../bin/hadoopjob-killjob_201412251631_0001
HadoopjobinformationforStage-1:numberofmappers:1;numberofreducers:1
2015-01-0821:09:01,496Stage-1map=0%,reduce=0%
2015-01-0821:09:03,531Stage-1map=100%,reduce=0%,CumulativeCPU1.48sec
2015-01-0821:09:11,625Stage-1map=100%,reduce=33%,CumulativeCPU1.48sec
2015-01-0821:09:12,639Stage-1map=100%,reduce=100%,CumulativeCPU4.17sec
MapReduceTotalcumulativeCPUtime:4seconds170msec
EndedJob=job_201412251631_0001
MapReduceJobsLaunched:
Job0:Map:1Reduce:1CumulativeCPU:4.17secHDFSRead:237HDFSWrite:6SUCCESS
TotalMapReduceCPUTimeSpent:4seconds170msec
OK
1
2
3
Timetaken:21.549seconds,Fetched:3row(s)