Hive索引

索引是标准的数据库技术,hive 0.7版本之后支持索引。hive索引采用的不是'one size fites all'的索引实现方式,而是提供插入式接口,并且提供一个具体的索引实现作为参考。
hive索引具有以下特点:
1.索引key冗余存储,提供基于key的数据视图
2.存储设计以优化查询&检索性能
3.对于某些查询减少IO,从而提高性能。

hive索引创建语句:

CREATE INDEX index_name
ON TABLE base_table_name (col_name, ...)
AS 'index.handler.class.name'
[WITH DEFERRED REBUILD]
[IDXPROPERTIES (property_name=property_value, ...)]
[IN TABLE index_table_name]
[PARTITIONED BY (col_name, ...)]
[
  [ ROW FORMAT ...] STORED AS ...
  | STORED BY ...
]
[LOCATION hdfs_path]
[TBLPROPERTIES (...)]
[COMMENT "index comment"]

注意:
1.index的partition默认和数据表一致
2.视图上不能创建index
3. index可以通过stored as配置存储格式

重建索引:

ALTER INDEX index_name ON table_name [PARTITION (...)] REBUILD

假如在创建索引是我们使用“ WITH DEFERRED REBUILD”语句,则索引创建是为空,可以通过“Alter index ... REBUILD”在一个partition上或所有partition上构建索引。
注意:
1.当hive数据更新时,必须调用该语句更新索引。
2. index rebuild操作时一个原子操作,因此,当rebuild失败时,先前构建的索引也无法使用

删除索引:

DROP INDEX index_name ON table_name

示例:
创建表&索引

hive> create table index_test(id INT, name STRING)
    > partitioned by (dt STRING)
    > row format delimited fields terminated by ',';
OK
Time taken: 32.446 seconds
hive> create table index_tmp(id INT, name STRING, dt STRING)
    > row format delimited fields terminated by ',';
OK
Time taken: 0.157 seconds
hive> load data local inpath '/home/work/data/alter_test.txt' into table index_tmp;
Copying data from file:/home/work/data/alter_test.txt
Copying file: file:/home/work/data/alter_test.txt
Loading data to table default.index_tmp
OK
Time taken: 1.587 seconds
hive> set hive.exec.dynamic.partition.mode=nonstrict;
hive> set hive.exec.dynamic.partition=true; 
hive> insert overwrite table index_test partition(dt)
    > select id,name,dt
    > from index_tmp;
Total MapReduce jobs = 2
......
OK
Time taken: 56.103 seconds
hive> create index index1_index_test on table index_test(id) AS  'org.apache.Hadoop.hive.ql.index.compact.CompactIndexHandler' WITH DEFERRED REBUILD ; 
OK
Time taken: 0.764 seconds
hive> alter index index1_index_test on index_test rebuild;
Total MapReduce jobs = 3
......
OK
Time taken: 138.589 seconds
hive> show index on index_test;
OK
index1_index_test       index_test              id                      default__index_test_index1_index_test__ compact             
Time taken: 0.163 seconds
hive> show  partitions index_test; 
OK
dt=2012-08-10
dt=2012-08-11
dt=2012-08-12
Time taken: 0.17 seconds

相关推荐