MapReduce基本认识

stoneechogx

2019-11-05

MapReduce是一种编程模型，用于大规模数据集（大于1TB）的并行运算。

主要由Split、Map、Partition、Sort、Combine（需要自己写）、Merge、Reduce组成，一般来说Split、Partition、Sort、Merge不需要工程师编程但是可以改写，主要是写出Map和Reduce中对数据的操作。

概念"Map（映射）"和"Reduce（归约）"，是它们的主要思想，都是从函数式编程语言里借来的，还有从矢量编程语言里借来的特性。它极大地方便了编程人员在不会分布式并行编程的情况下，将自己的程序运行在分布式系统上。当前的软件实现是指定一个Map（映射）函数，用来把一组键值对映射成一组新的键值对，指定并发的Reduce（归约）函数，用来保证所有映射的键值对中的每一个共享相同的键组。

统计单词个数

有Combine

MapReduce基本认识

无Combine

MapReduce基本认识

代码：

WordCount.java

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;


public class WordCount {

	/**
	 * @param args
	 * @throws IOException 
	 * @throws InterruptedException 
	 * @throws ClassNotFoundException 
	 */
	public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
		// TODO Auto-generated method stub
        Configuration conf = new Configuration();
        Job job=Job.getInstance(conf,"WordCount");
        job.setJarByClass(WordCount.class);
        job.setMapperClass(WordMapper.class);
        //job.setCombinerClass(WordCount)
        job.setReducerClass(WordReducer.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
        
        FileInputFormat.addInputPath(job,new Path("/input"));
        FileOutputFormat.setOutputPath(job, new Path("/output"));
        
        System.exit(job.waitForCompletion(true)?0:1);
	}
    
	public static class WordMapper extends Mapper<Object ,Text, Text, IntWritable>{
		protected void map(Object key, Text value ,Mapper<Object ,Text, Text, IntWritable>.Context context) throws IOException, InterruptedException{
			String[] words = value.toString().split(" ");
			for (String word:words){
				context.write(new Text(word),new IntWritable(1));
			}
		}
	}
	
	public static class WordReducer extends Reducer<Text, IntWritable,Text, IntWritable>{
		protected void reducer(Text key, Iterable<IntWritable> nums ,Reducer<Text, IntWritable,Text, IntWritable>.Context context) throws IOException, InterruptedException{
			int sum=0;
			for (IntWritable num:nums){
				sum+=num.get();
			}
			context.write(key,new IntWritable(sum));
		}
	}
	
}

mapreduce hadoop apache

安科网

MapReduce基本认识

stoneechogx

stoneechogx

相关推荐

hadoop框架三大组件hdfs、mapreduce、yarn 内容

JStorm介绍

mapreduce求topN

HBase与MapReduce交互

分布式计算框架——MapReduce

Hadoop之MapReduce学习(三)之ip去重、MaxScore示例、TotalScoreMapper示例

HBase和MapReduce

MapReduce(分布式计算)_01

MapReduce编程模型

bitmap、Trie、数据库索引、倒排索引、外排序、Mapreduce

MapReduce的使用

Hadoop基础---MapReduce对数据进行排序

Hadoop——MapReduce介绍

Hadoop架构及集群

MapReduce编程实战（2）-词频统计结果存入mysql数据库

MapReduce和Hive学习文档链接学习顺序

Hadoop学习之路(7)MapReduce自定义排序

python| MongoDB聚合(count、distinct、group、MapReduce)

Hadoop

hadoop自带性能测试

stoneechogx