安装的Hive
安装的Hive是Hive最新版本中的稳定版本,是基于Hadoop2.2.0,以前有写过,如何在hadoop1.x下面安装Hive0.8,本次Hive的版本是Hive0.13,可以直接在Hive官网上下载二进制包,无须进行源码编译。Hive需要依赖底层的Hadoop环境,所以在安装Hive前,请确保你的hadoop集群环境已经可以正常工作。
Hive0.13稳定版本的下载地址
http://apache.fayea.com/apache-mirror/hive/stable/
关于Hadoop2.2.0分布式集群的搭建
http://qindongliang1922.iteye.com/blog/2078423
MySQL的安装
http://qindongliang1922.iteye.com/blog/1987199
下载具体看下安装的步骤和过程:
1 | 序号 | 描述 | 2 | Hadoop2.2.0集群的搭建 | 底层依赖环境 | 3 | 下载Hive0.13的bin包,并解压 | Hive包 | 4 | 配置HIVE_HOME环境变量 | 环境变量所需 | 5 | 配置hive-env.sh | 涉及hadoop的目录,和hive的conf目录 | 6 | 配置hive-site.xml | 配置hive属性和集成MySQL存储元数据 | 7 | 启动bin/hive服务 | 测试启动hive | 8 | 建库,建表,测试hive | 测试hive是否正常工作 | 9 | 退出Hive客户端 | 执行命令exit | 10 | 工程师一枚 | 开工 | 11 | 拷贝mysql的jdbc包到hive的lib目录下 | 元数据存储为MySQL | 12 | hadoop技术交流群 | 376932160 |
首先,先执行如下4个命令,把Hive自带的模板文件,变为Hive实际所需的文件:
cp hive-default.xml.template hive-site.xml
cp hive-env.sh.template hive-env.sh
cp hive-exec-log4j.properties.template hive-exec-log4j.properties
cp hive-log4j.properties.template hive-log4j.properties
Hive环境变量的设置:
[search@h1 hive]$ bin/hive 14/07/30 04:18:08 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces 14/07/30 04:18:08 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize 14/07/30 04:18:08 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative 14/07/30 04:18:08 INFO Configuration.deprecation: mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node 14/07/30 04:18:08 INFO Configuration.deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive 14/07/30 04:18:08 INFO Configuration.deprecation: mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack 14/07/30 04:18:08 INFO Configuration.deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize 14/07/30 04:18:08 INFO Configuration.deprecation: mapred.committer.job.setup.cleanup.needed is deprecated. Instead, use mapreduce.job.committer.setup.cleanup.needed 14/07/30 04:18:09 WARN conf.HiveConf: DEPRECATED: Configuration property hive.metastore.local no longer has any effect. Make sure to provide a valid value for hive.metastore.uris if you are connecting to a remote metastore. Logging initialized using configuration in file:/home/search/hive/conf/hive-log4j.properties hive>
[search@h1 hive]$ bin/hive 14/07/30 04:18:08 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces 14/07/30 04:18:08 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize 14/07/30 04:18:08 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative 14/07/30 04:18:08 INFO Configuration.deprecation: mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node 14/07/30 04:18:08 INFO Configuration.deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive 14/07/30 04:18:08 INFO Configuration.deprecation: mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack 14/07/30 04:18:08 INFO Configuration.deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize 14/07/30 04:18:08 INFO Configuration.deprecation: mapred.committer.job.setup.cleanup.needed is deprecated. Instead, use mapreduce.job.committer.setup.cleanup.needed 14/07/30 04:18:09 WARN conf.HiveConf: DEPRECATED: Configuration property hive.metastore.local no longer has any effect. Make sure to provide a valid value for hive.metastore.uris if you are connecting to a remote metastore. Logging initialized using configuration in file:/home/search/hive/conf/hive-log4j.properties hive>
执行,建表命令,并导入数据:
建表:
create table mytt (name string ,count int) row format delimited fields terminated by '#' stored as textfile ;
导入数据:
LOAD DATA LOCAL INPATH '/home/search/abc1.txt' OVERWRITE INTO TABLE info;
执行查询命令,并降序输出:
Time taken: 0.837 seconds, Fetched: 5 row(s) hive> select * from info limit 5 order by count desc; FAILED: ParseException line 1:27 missing EOF at 'order' near '5' hive> select * from info order by count desc limit 5 ; Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapreduce.job.reduces=<number> Starting Job = job_1406660797211_0003, Tracking URL = http://h1:8088/proxy/application_1406660797211_0003/ Kill Command = /home/search/hadoop/bin/hadoop job -kill job_1406660797211_0003 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 2014-07-30 04:26:13,538 Stage-1 map = 0%, reduce = 0% 2014-07-30 04:26:26,398 Stage-1 map = 67%, reduce = 0%, Cumulative CPU 5.41 sec 2014-07-30 04:26:27,461 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 6.64 sec 2014-07-30 04:26:39,177 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 10.02 sec MapReduce Total cumulative CPU time: 10 seconds 20 msec Ended Job = job_1406660797211_0003 MapReduce Jobs Launched: Job 0: Map: 1 Reduce: 1 Cumulative CPU: 10.02 sec HDFS Read: 143906707 HDFS Write: 85 SUCCESS Total MapReduce CPU Time Spent: 10 seconds 20 msec OK 英的国 999999 中的国 999997 美的国 999996 中的国 999993 英的国 999992 Time taken: 37.892 seconds, Fetched: 5 row(s) hive>
Time taken: 0.837 seconds, Fetched: 5 row(s) hive> select * from info limit 5 order by count desc; FAILED: ParseException line 1:27 missing EOF at 'order' near '5' hive> select * from info order by count desc limit 5 ; Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapreduce.job.reduces=<number> Starting Job = job_1406660797211_0003, Tracking URL = http://h1:8088/proxy/application_1406660797211_0003/ Kill Command = /home/search/hadoop/bin/hadoop job -kill job_1406660797211_0003 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 2014-07-30 04:26:13,538 Stage-1 map = 0%, reduce = 0% 2014-07-30 04:26:26,398 Stage-1 map = 67%, reduce = 0%, Cumulative CPU 5.41 sec 2014-07-30 04:26:27,461 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 6.64 sec 2014-07-30 04:26:39,177 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 10.02 sec MapReduce Total cumulative CPU time: 10 seconds 20 msec Ended Job = job_1406660797211_0003 MapReduce Jobs Launched: Job 0: Map: 1 Reduce: 1 Cumulative CPU: 10.02 sec HDFS Read: 143906707 HDFS Write: 85 SUCCESS Total MapReduce CPU Time Spent: 10 seconds 20 msec OK 英的国 999999 中的国 999997 美的国 999996 中的国 999993 英的国 999992 Time taken: 37.892 seconds, Fetched: 5 row(s) hive>
hive shell一些交互式命令的使用方法:
quit,exit: 退出交互式shell reset: 重置配置为默认值 set <key>=<value> : 修改特定变量的值(如果变量名拼写错误,不会报错) set : 输出用户覆盖的hive配置变量 set -v : 输出所有Hadoop和Hive的配置变量 add FILE[S] *, add JAR[S] *, add ARCHIVE[S] * : 添加 一个或多个 file, jar, archives到分布式缓存 list FILE[S], list JAR[S], list ARCHIVE[S] : 输出已经添加到分布式缓存的资源。 list FILE[S] *, list JAR[S] *,list ARCHIVE[S] * : 检查给定的资源是否添加到分布式缓存 delete FILE[S] *,delete JAR[S] *,delete ARCHIVE[S] * : 从分布式缓存删除指定的资源 ! <command> : 从Hive shell执行一个shell命令 dfs <dfs command> : 从Hive shell执行一个dfs命令 <query string> : 执行一个Hive 查询,然后输出结果到标准输出 source FILE <filepath>: 在CLI里执行一个hive脚本文件
quit,exit: 退出交互式shell reset: 重置配置为默认值 set <key>=<value> : 修改特定变量的值(如果变量名拼写错误,不会报错) set : 输出用户覆盖的hive配置变量 set -v : 输出所有Hadoop和Hive的配置变量 add FILE[S] *, add JAR[S] *, add ARCHIVE[S] * : 添加 一个或多个 file, jar, archives到分布式缓存 list FILE[S], list JAR[S], list ARCHIVE[S] : 输出已经添加到分布式缓存的资源。 list FILE[S] *, list JAR[S] *,list ARCHIVE[S] * : 检查给定的资源是否添加到分布式缓存 delete FILE[S] *,delete JAR[S] *,delete ARCHIVE[S] * : 从分布式缓存删除指定的资源 ! <command> : 从Hive shell执行一个shell命令 dfs <dfs command> : 从Hive shell执行一个dfs命令 <query string> : 执行一个Hive 查询,然后输出结果到标准输出 source FILE <filepath>: 在CLI里执行一个hive脚本文件
以debug模式启动: hive -hiveconf hive.root.logger=DEBUG,console
至此,我们的Hive,已经安装成功,并可以正常运行。