hive分区表
内部表和外部表
内部表:create table ,copy数据到warehouse,删除表时数据也会删除
外部表:create external table ,不copy数据到warehouse,删除表时数据不会删除
表的分区
分区的好处:如果不建立分区的话,则会全表扫描
数据通过目录划分分区,分区字段是特殊字段
目录结构:/pub/{dt}/{customer_id}/
添加分区:ALTER TABLE fs ADD PATITION(dt='20150305',customer_id) location '/pub/20150315/0001'
使用分区进行查询:
select * from fc where dt='20150315' and customer_id='0001'
日期字段(字符串类型)可以进行比较:t.dt>'20150413'
分区表的创建:
create table hive_patition(name string,salary float,gender string,level string)
partitioned by(dt string,dep string)
row format delimited fields treminated by ','
stored as textfile;
desc hive_partition;
show partitions hive_partition;//查看分区表的分区信息
导入数据:
load data local inpath '/opt/20150402/party.log' into table hive_partition partition(dt='2015-04015',dep='dev')
修改表增加分区:
alter table hive_partition add partition(dt='2015-04-01',dep='dev1') location '/pub/20150401/0001' partition(dt='2015-04-02',dep='dev2') location '/pub/20150402/0001'
修改表删除分区:
alter table hive_partition drop partition(dt='2015-04-02',dep='dev2')
修改表重命名表:
alter table hive_partition rename to partition_test
修改表中的字段:
alter table partition_test change name userName
添加字段:
alter table partition_test add columns(testColumn string)
嵌套查询:
from (select name,salary from partition_table) e select e.name,e.salary where e.salary>10000;
case when then语句
select name,salary,case when salary>1 and salary<5000 then 'L1' when salary>=5000 and salary<10000 then 'L2' when salary>=10000 and salary<150000 then 'L3' when salary>10000 then 'L4' else 'L0' end as salaryLevel from
partition_table;
having的用法:
select gender,sum(salary) as totalSalary from partition_table group by gender having totalSalary>50000;