Hadoop 简单示例

把握经济动脉

2015-12-22

关注关注

1.云计算的概念

狭义云计算是指IT基础设施的交付和使用模式，通过网络以按需、易扩展的方式获得所需的资源（硬件、平台、软件）。

广义云计算是指服务的交付和使用模式，通过网络以按需、易扩展的方式获得所需的服务。这种服务可以是IT和软件、互联网相关的，也可以是任意其他的服务。

2.三层模型

Saas：more

Paas：hadoop

Iaas： openstack

3.google VS hadoop

google concept	hadoop concept
MapReduce	Hadoop
GFS	HDFS
Bigtable	HBase
Chubby	Zookeeper

4.hadoop 编写map和reduce函数

4.1 map函数

public static class TokenizerMapper  extends Mapper<Object, Text, Text, IntWritable>{

   private final static IntWritable one = new IntWritable(1);
   private Text word = new Text();
      
   public void map(Object key, Text value, Context context) 
                            throws IOException, InterruptedException {
      StringTokenizer itr = new StringTokenizer(value.toString());
      while (itr.hasMoreTokens()) {
        word.set(itr.nextToken());            
        context.write(word, one);           //设置 key  value
      }
    }
}

说明： map的输出key 、value和reduce的输入key、value要一致

4.2 reduce

public static class IntSumReducer 
       extends Reducer<Text,IntWritable,Text,IntWritable> {
    private IntWritable result = new IntWritable();
 
    public void reduce(Text key, Iterable<IntWritable> values, Context context)                                       throws IOException, InterruptedException {
      int sum = 0;
      for (IntWritable val : values) {
        sum += val.get();                                    //聚集操作
      }
      result.set(sum);
      context.write(key, result);
    }
  }

说明： map的输出key 、value和reduce的输入key、value要一致，见上面红色部分

4.3 job的配置

public static void main(String[] args) throws Exception {
    Configuration conf = new Configuration();
    String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
    if (otherArgs.length != 2) {
      System.err.println("Usage: wordcount <in> <out>"); 
      System.exit(2);
    }
    Job job = new Job(conf, "word count");     //job name 
    job.setJarByClass(WordCount.class);
    job.setMapperClass(TokenizerMapper.class);
    job.setCombinerClass(IntSumReducer.class);
    job.setReducerClass(IntSumReducer.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);
    FileInputFormat.addInputPath(job, new Path(otherArgs[0]));    //file input 
    FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));  //file output
    System.exit(job.waitForCompletion(true) ? 0 : 1);
}

5.命令行运行

步骤：

a.打包mapreduce函数，wordcount.jar 设类名WordCount

b.进入hadoop安装目录

c.执行方式：hadoop jar 本地jar包目录类名 hdfs输入文件目录 hdfs输入文件目录

例如：hadoop jar /home/deke/wordcount.jar WordCount hdfs输入文件目录 hdfs输出文件目录

6.eclipse配置

步骤：

a.下载eclipse

b.将 hadoop 文件夹下的 contrib/eclipse-plugin/hadoop-*-eclipse- plugin.jar ,

拷贝到 eclipse 文件夹下的/plugins 文件夹里

c.启动 Eclipse

d.设置 Hadoop 安装文件夹的路径

Window->Preferences—>hadoop Map/Reduce设置 hadoop的linux下文件位置，如：/usr/hadoop

e.window->show view->other->MapReduce Tool ->Map/Reduce Location,在Map/Reduce Location控制台空白处，右击选择“New Map/Reduce Location”,在弹出的对话框里，根据core-site.xml和maperd-site.xml里的端口填写

转自：hadoop基础学习（一）

hadoop td 云计算