Hive学习之WordCount单词统计

ydbjason

2013-04-16

关注关注

单词统计相当于编程开始的HELLO WORLD。应该都跑过。假设这里有一个文档，里面有两行这样的话：

Hello World Bye World

Hello Hadoop GoodBye Hadoop

最终要显示的结果如下：

程序如下：

Map：

public class MapClass extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> {

private final static IntWritable one = new IntWritable(1);
private Text word = new Text();

public void map(LongWritable key, Text value,
OutputCollector<Text, IntWritable> output,
Reporter reporter) throws IOException {
String line = value.toString();
StringTokenizer itr = new StringTokenizer(line);
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
output.collect(word, one);
}
}
}

Reduce：

public class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> {

public void reduce(Text key, Iterator<IntWritable> values,
OutputCollector<Text, IntWritable> output,
Reporter reporter) throws IOException {
int sum = 0;
while (values.hasNext()) {
sum += values.next().get();
}
output.collect(key, new IntWritable(sum));
}

}

客户端：

public class WordCount {
public static void main(String[] args) throws Exception {
JobConf conf = new JobConf(WordCount.class);
conf.setJobName("wordcount");

conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);

conf.setMapperClass(Map.class);
conf.setCombinerClass(Reduce.class);
conf.setReducerClass(Reduce.class);

conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);

FileInputFormat.setInputPaths(conf, new Path(args[0]));
FileOutputFormat.setOutputPath(conf, new Path(args[1]));
JobClient.runJob(conf);
}
}

以上是传统的MR程序。现在，我们可以利用hive来做这样的事。

hive 单词

安科网

Hive学习之WordCount单词统计

ydbjason

ydbjason

相关推荐

3（Hive）

Hive函数大全-完整版

hdfs、hive、hbase的搭建总结

hive函数之~hive当中的lateral view 与 explode

hive函数之~窗口函数与分析函数

hive函数之~reflect函数

hive函数之~条件函数

hive函数之~日期函数

hive函数之~字符串函数

hive函数之~关系运算

Hive使用

Hive的安装与启动

Hive llap服务安装说明及测试（二）

Hive学习之路（二）Hive安装

Hadoop

（一）hive远程模式搭建

Hive学习(二)【数据类型、类型转换】

Hive1.2.2（一）

hive开窗开窗函数进阶

数据仓库 ODS原始数据层操作

ydbjason