csv文件导入hive表

1  csv格式(就是讲mysql表数据通过Sqlyog 导出时,默认的导出文件格式)

CSV格式的文件也称为逗号分隔值(Comma-Separated Values,CSV,有时也称为字符分隔值,因为分隔字符也可以不是逗号。在本文中的CSV格式的数据就不是简单的逗号分割的),其文件以纯文本形式存储表格数据(数字和文本)。CSV文件由任意数目的记录组成,记录间以某种换行符分隔;每条记录由字段组成,字段间的分隔符是其它字符或字符串,最常见的是逗号或制表符。通常,所有记录都有完全相同的字段序列。

1.1 导出后,可以指定导出时字符间隔(默认是\t)和字符包裹类型(可以不指定包裹类型), 如下图:


csv文件导入hive表
 

2 hive支持导入 .csv格式数据,步骤如下:

a)

导出后看导出样子,建议使用txt格式打开,这样可以看到字符的间隔,如果用excel打开,是看不到字符之间

是用你指定的字符还是用默认\t间隔的了,

这里我导出的文件用txt打开如下, 内容没用'' 包裹

12,1.71301E+15,23G,15589836997,20141201,2,532,13606343566,1,532,0,0,0,1,91,2
12,1.71207E+15,23G,18661866329,20141201,1,25,18952082990,3,25,0,2,0,1,31,1
12,1.71307E+15,23G,13026513953,20141201,1,530,15269099707,1,530,1,1,0,2,667,12
12,3.20812E+15,23G,13061276785,20141201,1,532,13954223917,1,532,0,0,0,1,18,1
12,3.21009E+15,23G,15653208256,20141201,1,532,15864736958,1,532,0,0,0,1,15,1
12,1.71312E+15,23G,13256887098,20141201,1,532,15264276875,1,532,0,0,0,1,45,1

b) hive中创建表:

create table cvs
(
month_id string,
user_no string,
net_type string,
device_number string,
start_date string,
org_trm_id string,
other_home_code string,
oppose_number string,
oppose_number_type string,
other_roam_code string,
roam_type string,
long_type string,
call_hour_seg string,
cdr_num string,
call_time string,
fee_number string
) row format serde 'org.apache.hadoop.hive.serde2.OpenCSVSerde' with serdeproperties
(   
"separatorChar" = ",", "escapeChar"    = "\\")  STORED AS TEXTFILE;

这是hive创建对饮格式表最全的写法,如下

CREATE TABLE csv_table(a string, b string) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' WITH SERDEPROPERTIES (   "separatorChar" = "\t",   "quoteChar"     = "'",   "escapeChar"    = "\\")  STORED AS TEXTFILE;

c)  导出数据上传到linux 后 hive 从linux中将数据导入到hive表:

load data  local inpath  'yuyin.csv' into table cvs;

d) 查询:

hive (default)> select *  from cvs limit 10;
OK
cvs.month_id    cvs.user_no     cvs.net_type    cvs.device_number       cvs.start_date  cvs.org_trm_id  cvs.other_home_code     cvs.oppose_number       cvs.oppose_number_type  cvs.other_roam_code     cvs.roam_type   cvs.long_type   cvs.call_hour_seg       cvs.cdr_num     cvs.call_time   cvs.fee_number
12      1.71E+15        23G     15589836997     20141201        2       532    13606343566      1       532     0       0       0       1       91      2
12      1.71E+15        23G     18661866329     20141201        1       25     18952082990      3       25      0       2       0       1       31      1
12      1.71E+15        23G     13026513953     20141201        1       530    15269099707      1       530     1       1       0       2       667     12
12      3.21E+15        23G     13061276785     20141201        1       532    13954223917      1       532     0       0       0       1       18      1
12      3.21E+15        23G     15653208256     20141201        1       532    15864736958      1       532     0       0       0       1       15      1
12      1.71E+15        23G     13256887098     20141201        1       532    15264276875      1       532     0       0       0       1       45      1
12      3.21E+15        23G     15692326467     20141201        2       532    15969838768      1       532     0       0       0       1       7       1
12      3.71E+15        23G     18561738929     20141201        1       535    17862806081      1       535     1       0       0       1       12      1
12      1.71E+15        23G     13127055909     20141201        1       530    13573075730      1       530     0       1       0       1       48      1
12      2.21E+15        23G     15689487889     20141201        1       532    15063978623      1       532     0       0       0       1       39      1
Time taken: 2.042 seconds, Fetched: 10 row(s)