Homework - Running Hadoop WordCount Examples

whspringer

2013-07-12

Hadoop workshop homework.

Since I am an Intellij Idea guy now (I shifted to Intellij Idea from Eclipse several months ago because Intellij Idea is much much better than Eclipse now). Currently Intellij does't have any Hadoop plugins, so I package the output into a jar file, then copy the jar (cdh4-example.jar) into Hadoop cluster with scp.

[root@n1 hadoop-examples]# scp [email protected]:~/prog/hadoop/cdh4-examples/cdh4-examples.jar .
Password:
cdh4-examples.jar

First example: WordCount01. Code as below:

package wc;

import java.io.IOException;
import java.text.DateFormat;
import java.text.SimpleDateFormat;
import java.util.Date;
import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;

public class WordCount01 extends Configured implements Tool {

    public static class TokenizerMapper
            extends Mapper<Object, Text, Text, IntWritable> {

        private final static IntWritable one = new IntWritable(1);
        private Text word = new Text();

        public void map(Object key, Text value, Context context
        ) throws IOException, InterruptedException {
            StringTokenizer itr = new StringTokenizer(value.toString());
            while (itr.hasMoreTokens()) {
                word.set(itr.nextToken());
                context.write(word, one);
            }
        }
    }

    public static class IntSumReducer
            extends Reducer<Text, IntWritable, Text, IntWritable> {
        private IntWritable result = new IntWritable();

        public void reduce(Text key, Iterable<IntWritable> values,
                           Context context
        ) throws IOException, InterruptedException {
            int sum = 0;
            for (IntWritable val : values) {
                sum += val.get();
            }
            result.set(sum);
            context.write(key, result);
        }
    }

    @Override
    public int run(String[] args) throws Exception {
        Configuration conf = getConf();
        //conf.set("mapred.job.tracker", "192.168.1.201:9001");

        Job job = new Job(conf, "word count");
        job.setJarByClass(WordCount01.class);
        job.setMapperClass(TokenizerMapper.class);
        job.setCombinerClass(IntSumReducer.class);
        job.setReducerClass(IntSumReducer.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));

        job.waitForCompletion(true);

        System.out.println("Task name: " + job.getJobName());
        System.out.println("Task success? " + (job.isSuccessful() ? "Yes" : "No"));

        return job.isSuccessful() ? 0 : 1;
    }

    public static void main(String[] args) throws Exception {

        if (args.length != 2) {
            System.err.println("Usage: wordcount <in> <out>");
            System.exit(2);
        }

        DateFormat formatter = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
        Date start = new Date();

        int res = ToolRunner.run(new Configuration(), new WordCount01(), args);

        Date end = new Date();
        float time = (float) ((end.getTime() - start.getTime()) / 60000.0);
        System.out.println("Task start: " + formatter.format(start));
        System.out.println("Task end: " + formatter.format(end));
        System.out.println("Time elapsed: " + String.valueOf(time) + " minutes.");

        System.exit(res);
    }

}

Output:

[root@n1 hadoop-examples]# hadoop jar cdh4-examples.jar wc.WordCount01 Shakespeare.txt output
13/07/12 23:02:56 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
13/07/12 23:02:57 INFO input.FileInputFormat: Total input paths to process : 1
13/07/12 23:02:59 INFO mapred.JobClient: Running job: job_201307122107_0006
13/07/12 23:03:00 INFO mapred.JobClient:  map 0% reduce 0%
13/07/12 23:03:32 INFO mapred.JobClient:  map 26% reduce 0%
13/07/12 23:03:36 INFO mapred.JobClient:  map 100% reduce 0%
13/07/12 23:03:49 INFO mapred.JobClient:  map 100% reduce 100%
13/07/12 23:03:56 INFO mapred.JobClient: Job complete: job_201307122107_0006
13/07/12 23:03:56 INFO mapred.JobClient: Counters: 32
13/07/12 23:03:56 INFO mapred.JobClient:   File System Counters
13/07/12 23:03:56 INFO mapred.JobClient:     FILE: Number of bytes read=2151353
13/07/12 23:03:56 INFO mapred.JobClient:     FILE: Number of bytes written=2933308
13/07/12 23:03:56 INFO mapred.JobClient:     FILE: Number of read operations=0
13/07/12 23:03:56 INFO mapred.JobClient:     FILE: Number of large read operations=0
13/07/12 23:03:56 INFO mapred.JobClient:     FILE: Number of write operations=0
13/07/12 23:03:56 INFO mapred.JobClient:     HDFS: Number of bytes read=10185958
13/07/12 23:03:56 INFO mapred.JobClient:     HDFS: Number of bytes written=707043
13/07/12 23:03:56 INFO mapred.JobClient:     HDFS: Number of read operations=2
13/07/12 23:03:56 INFO mapred.JobClient:     HDFS: Number of large read operations=0
13/07/12 23:03:56 INFO mapred.JobClient:     HDFS: Number of write operations=1
13/07/12 23:03:56 INFO mapred.JobClient:   Job Counters 
13/07/12 23:03:56 INFO mapred.JobClient:     Launched map tasks=1
13/07/12 23:03:56 INFO mapred.JobClient:     Launched reduce tasks=1
13/07/12 23:03:56 INFO mapred.JobClient:     Data-local map tasks=1
13/07/12 23:03:56 INFO mapred.JobClient:     Total time spent by all maps in occupied slots (ms)=32808
13/07/12 23:03:56 INFO mapred.JobClient:     Total time spent by all reduces in occupied slots (ms)=9469
13/07/12 23:03:56 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
13/07/12 23:03:56 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
13/07/12 23:03:56 INFO mapred.JobClient:   Map-Reduce Framework
13/07/12 23:03:56 INFO mapred.JobClient:     Map input records=417884
13/07/12 23:03:56 INFO mapred.JobClient:     Map output records=1612019
13/07/12 23:03:56 INFO mapred.JobClient:     Map output bytes=15218645
13/07/12 23:03:56 INFO mapred.JobClient:     Input split bytes=117
13/07/12 23:03:56 INFO mapred.JobClient:     Combine input records=1852684
13/07/12 23:03:56 INFO mapred.JobClient:     Combine output records=306113
13/07/12 23:03:56 INFO mapred.JobClient:     Reduce input groups=65448
13/07/12 23:03:56 INFO mapred.JobClient:     Reduce shuffle bytes=470288
13/07/12 23:03:56 INFO mapred.JobClient:     Reduce input records=65448
13/07/12 23:03:56 INFO mapred.JobClient:     Reduce output records=65448
13/07/12 23:03:56 INFO mapred.JobClient:     Spilled Records=371561
13/07/12 23:03:56 INFO mapred.JobClient:     CPU time spent (ms)=9970
13/07/12 23:03:56 INFO mapred.JobClient:     Physical memory (bytes) snapshot=288149504
13/07/12 23:03:56 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=3245924352
13/07/12 23:03:56 INFO mapred.JobClient:     Total committed heap usage (bytes)=142344192
Task name: word count
Tash success? Yes
Task start: 2013-07-12 23:02:53
Task end: 2013-07-12 23:03:56
Time elapsed: 1.046 minutes.

Note that in this example, we used a Combiner class, which is the same to the Reducer class(they must be the same). The combiner is used as local aggregator to reduce the data copied from mapper to reducer. We can see the difference between WordCount01 with WordCount02 below, from which the Combiner will be removed.

The second example:

package wc;

import java.io.IOException;
import java.text.DateFormat;
import java.text.SimpleDateFormat;
import java.util.Date;
import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;

public class WordCount02 extends Configured implements Tool {

    public static class TokenizerMapper
            extends Mapper<Object, Text, Text, IntWritable> {

        private final static IntWritable one = new IntWritable(1);
        private Text word = new Text();

        public void map(Object key, Text value, Context context
        ) throws IOException, InterruptedException {
            StringTokenizer itr = new StringTokenizer(value.toString());
            while (itr.hasMoreTokens()) {
                word.set(itr.nextToken());
                context.write(word, one);
            }
        }
    }

    public static class IntSumReducer
            extends Reducer<Text, IntWritable, Text, IntWritable> {
        private IntWritable result = new IntWritable();

        public void reduce(Text key, Iterable<IntWritable> values,
                           Context context
        ) throws IOException, InterruptedException {
            int sum = 0;
            for (IntWritable val : values) {
                sum += val.get();
            }
            result.set(sum);
            context.write(key, result);
        }
    }

    @Override
    public int run(String[] args) throws Exception {
        Configuration conf = getConf();
        //conf.set("mapred.job.tracker", "192.168.1.201:9001");

        Job job = new Job(conf, "word count");
        job.setJarByClass(WordCount02.class);
        job.setMapperClass(TokenizerMapper.class);

        // job.setCombinerClass(IntSumReducer.class);

        job.setReducerClass(IntSumReducer.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));

        job.waitForCompletion(true);

        System.out.println("Task name: " + job.getJobName());
        System.out.println("Task success? " + (job.isSuccessful() ? "Yes" : "No"));

        return job.isSuccessful() ? 0 : 1;
    }

    public static void main(String[] args) throws Exception {

        if (args.length != 2) {
            System.err.println("Usage: wordcount <in> <out>");
            System.exit(2);
        }

        DateFormat formatter = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
        Date start = new Date();

        int res = ToolRunner.run(new Configuration(), new WordCount02(), args);

        Date end = new Date();
        float time = (float) ((end.getTime() - start.getTime()) / 60000.0);
        System.out.println("Task start: " + formatter.format(start));
        System.out.println("Task end: " + formatter.format(end));
        System.out.println("Time elapsed: " + String.valueOf(time) + " minutes.");

        System.exit(res);
    }

}

Output:

[root@n1 hadoop-examples]# hadoop jar cdh4-examples.jar wc.WordCount02 Shakespeare.txt output
13/07/12 23:16:20 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
13/07/12 23:16:20 INFO input.FileInputFormat: Total input paths to process : 1
13/07/12 23:16:24 INFO mapred.JobClient: Running job: job_201307122107_0007
13/07/12 23:16:25 INFO mapred.JobClient:  map 0% reduce 0%
13/07/12 23:16:42 INFO mapred.JobClient:  map 100% reduce 0%
13/07/12 23:17:03 INFO mapred.JobClient:  map 100% reduce 67%
13/07/12 23:17:06 INFO mapred.JobClient:  map 100% reduce 100%
13/07/12 23:17:12 INFO mapred.JobClient: Job complete: job_201307122107_0007
13/07/12 23:17:12 INFO mapred.JobClient: Counters: 32
13/07/12 23:17:12 INFO mapred.JobClient:   File System Counters
13/07/12 23:17:12 INFO mapred.JobClient:     FILE: Number of bytes read=3871328
13/07/12 23:17:12 INFO mapred.JobClient:     FILE: Number of bytes written=5564882
13/07/12 23:17:12 INFO mapred.JobClient:     FILE: Number of read operations=0
13/07/12 23:17:12 INFO mapred.JobClient:     FILE: Number of large read operations=0
13/07/12 23:17:12 INFO mapred.JobClient:     FILE: Number of write operations=0
13/07/12 23:17:12 INFO mapred.JobClient:     HDFS: Number of bytes read=10185958
13/07/12 23:17:12 INFO mapred.JobClient:     HDFS: Number of bytes written=707043
13/07/12 23:17:12 INFO mapred.JobClient:     HDFS: Number of read operations=2
13/07/12 23:17:12 INFO mapred.JobClient:     HDFS: Number of large read operations=0
13/07/12 23:17:12 INFO mapred.JobClient:     HDFS: Number of write operations=1
13/07/12 23:17:12 INFO mapred.JobClient:   Job Counters 
13/07/12 23:17:12 INFO mapred.JobClient:     Launched map tasks=1
13/07/12 23:17:12 INFO mapred.JobClient:     Launched reduce tasks=1
13/07/12 23:17:12 INFO mapred.JobClient:     Data-local map tasks=1
13/07/12 23:17:12 INFO mapred.JobClient:     Total time spent by all maps in occupied slots (ms)=20788
13/07/12 23:17:12 INFO mapred.JobClient:     Total time spent by all reduces in occupied slots (ms)=20552
13/07/12 23:17:12 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
13/07/12 23:17:12 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
13/07/12 23:17:12 INFO mapred.JobClient:   Map-Reduce Framework
13/07/12 23:17:12 INFO mapred.JobClient:     Map input records=417884
13/07/12 23:17:12 INFO mapred.JobClient:     Map output records=1612019
13/07/12 23:17:12 INFO mapred.JobClient:     Map output bytes=15218645
13/07/12 23:17:12 INFO mapred.JobClient:     Input split bytes=117
13/07/12 23:17:12 INFO mapred.JobClient:     Combine input records=0
13/07/12 23:17:12 INFO mapred.JobClient:     Combine output records=0
13/07/12 23:17:12 INFO mapred.JobClient:     Reduce input groups=65448
13/07/12 23:17:12 INFO mapred.JobClient:     Reduce shuffle bytes=1382689
13/07/12 23:17:12 INFO mapred.JobClient:     Reduce input records=1612019
13/07/12 23:17:12 INFO mapred.JobClient:     Reduce output records=65448
13/07/12 23:17:12 INFO mapred.JobClient:     Spilled Records=4836057
13/07/12 23:17:12 INFO mapred.JobClient:     CPU time spent (ms)=11730
13/07/12 23:17:12 INFO mapred.JobClient:     Physical memory (bytes) snapshot=275365888
13/07/12 23:17:12 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=2282356736
13/07/12 23:17:12 INFO mapred.JobClient:     Total committed heap usage (bytes)=91426816
Task name: word count
Task success? Yes
Task start: 2013-07-12 23:16:18
Task end: 2013-07-12 23:17:12
Time elapsed: 0.8969 minutes.

Note the difference between WordCount02 and WordCount01, in WordCount02, the combiner was removed, so the reducer input records increased, as the job output shown:

Reduce input records=1612019

In WordCount01, which was run with combiner:

Reduce input records=65448

The third example:

package wc;

import java.io.IOException;
import java.text.DateFormat;
import java.text.SimpleDateFormat;
import java.util.Date;
import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;

public class WordCount03 extends Configured implements Tool {

    public static class TokenizerMapper
            extends Mapper<Object, Text, Text, IntWritable> {

        private final static IntWritable one = new IntWritable(1);
        private Text word = new Text();

        public void map(Object key, Text value, Context context
        ) throws IOException, InterruptedException {
            StringTokenizer itr = new StringTokenizer(value.toString());
            while (itr.hasMoreTokens()) {
                word.set(itr.nextToken());
                context.write(word, one);
            }
        }
    }

    public static class IntSumReducer
            extends Reducer<Text, IntWritable, Text, IntWritable> {
        private IntWritable result = new IntWritable();

        public void reduce(Text key, Iterable<IntWritable> values,
                           Context context
        ) throws IOException, InterruptedException {
            int sum = 0;
            for (IntWritable val : values) {
                sum += val.get();
            }
            result.set(sum);
            context.write(key, result);
        }
    }

    @Override
    public int run(String[] args) throws Exception {
        Configuration conf = getConf();
        //conf.set("mapred.job.tracker", "192.168.1.201:9001");

        Job job = new Job(conf, "word count");
        job.setJarByClass(WordCount03.class);
        job.setMapperClass(TokenizerMapper.class);
        job.setCombinerClass(IntSumReducer.class);
        job.setReducerClass(IntSumReducer.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);

        job.setNumReduceTasks(2);

        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));

        job.waitForCompletion(true);

        System.out.println("Task name: " + job.getJobName());
        System.out.println("Task success? " + (job.isSuccessful() ? "Yes" : "No"));

        return job.isSuccessful() ? 0 : 1;
    }

    public static void main(String[] args) throws Exception {

        if (args.length != 2) {
            System.err.println("Usage: wordcount <in> <out>");
            System.exit(2);
        }

        DateFormat formatter = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
        Date start = new Date();

        int res = ToolRunner.run(new Configuration(), new WordCount03(), args);

        Date end = new Date();
        float time = (float) ((end.getTime() - start.getTime()) / 60000.0);
        System.out.println("Task start: " + formatter.format(start));
        System.out.println("Task end: " + formatter.format(end));
        System.out.println("Time elapsed: " + String.valueOf(time) + " minutes.");

        System.exit(res);
    }

}

Output:

[root@n1 hadoop-examples]# hadoop jar cdh4-examples.jar wc.WordCount03 Shakespeare.txt output
13/07/12 23:18:53 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
13/07/12 23:18:53 INFO input.FileInputFormat: Total input paths to process : 1
13/07/12 23:18:55 INFO mapred.JobClient: Running job: job_201307122107_0008
13/07/12 23:18:56 INFO mapred.JobClient:  map 0% reduce 0%
13/07/12 23:19:16 INFO mapred.JobClient:  map 70% reduce 0%
13/07/12 23:19:19 INFO mapred.JobClient:  map 100% reduce 0%
13/07/12 23:19:33 INFO mapred.JobClient:  map 100% reduce 50%
13/07/12 23:19:34 INFO mapred.JobClient:  map 100% reduce 100%
13/07/12 23:19:39 INFO mapred.JobClient: Job complete: job_201307122107_0008
13/07/12 23:19:39 INFO mapred.JobClient: Counters: 32
13/07/12 23:19:39 INFO mapred.JobClient:   File System Counters
13/07/12 23:19:39 INFO mapred.JobClient:     FILE: Number of bytes read=3042226
13/07/12 23:19:39 INFO mapred.JobClient:     FILE: Number of bytes written=3277040
13/07/12 23:19:39 INFO mapred.JobClient:     FILE: Number of read operations=0
13/07/12 23:19:39 INFO mapred.JobClient:     FILE: Number of large read operations=0
13/07/12 23:19:39 INFO mapred.JobClient:     FILE: Number of write operations=0
13/07/12 23:19:39 INFO mapred.JobClient:     HDFS: Number of bytes read=10185958
13/07/12 23:19:39 INFO mapred.JobClient:     HDFS: Number of bytes written=707043
13/07/12 23:19:39 INFO mapred.JobClient:     HDFS: Number of read operations=2
13/07/12 23:19:39 INFO mapred.JobClient:     HDFS: Number of large read operations=0
13/07/12 23:19:39 INFO mapred.JobClient:     HDFS: Number of write operations=2
13/07/12 23:19:39 INFO mapred.JobClient:   Job Counters 
13/07/12 23:19:39 INFO mapred.JobClient:     Launched map tasks=1
13/07/12 23:19:39 INFO mapred.JobClient:     Launched reduce tasks=2
13/07/12 23:19:39 INFO mapred.JobClient:     Data-local map tasks=1
13/07/12 23:19:39 INFO mapred.JobClient:     Total time spent by all maps in occupied slots (ms)=25064
13/07/12 23:19:39 INFO mapred.JobClient:     Total time spent by all reduces in occupied slots (ms)=21033
13/07/12 23:19:39 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
13/07/12 23:19:39 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
13/07/12 23:19:39 INFO mapred.JobClient:   Map-Reduce Framework
13/07/12 23:19:39 INFO mapred.JobClient:     Map input records=417884
13/07/12 23:19:39 INFO mapred.JobClient:     Map output records=1612019
13/07/12 23:19:39 INFO mapred.JobClient:     Map output bytes=15218645
13/07/12 23:19:39 INFO mapred.JobClient:     Input split bytes=117
13/07/12 23:19:39 INFO mapred.JobClient:     Combine input records=1852684
13/07/12 23:19:39 INFO mapred.JobClient:     Combine output records=306113
13/07/12 23:19:39 INFO mapred.JobClient:     Reduce input groups=65448
13/07/12 23:19:39 INFO mapred.JobClient:     Reduce shuffle bytes=503734
13/07/12 23:19:39 INFO mapred.JobClient:     Reduce input records=65448
13/07/12 23:19:39 INFO mapred.JobClient:     Reduce output records=65448
13/07/12 23:19:39 INFO mapred.JobClient:     Spilled Records=371561
13/07/12 23:19:39 INFO mapred.JobClient:     CPU time spent (ms)=13840
13/07/12 23:19:39 INFO mapred.JobClient:     Physical memory (bytes) snapshot=395960320
13/07/12 23:19:39 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=3918856192
13/07/12 23:19:39 INFO mapred.JobClient:     Total committed heap usage (bytes)=162201600
Task name: word count
Task success? Yes
Task start: 2013-07-12 23:18:50
Task end: 2013-07-12 23:19:39
Time elapsed: 0.80191666 minutes.

In the third example, the number of reduce tasks was changed to 2, so there will be two reduce output file:

-rw-r--r--   3 root supergroup     353110 2013-07-12 23:19 output/part-r-00000
-rw-r--r--   3 root supergroup     353933 2013-07-12 23:19 output/part-r-00001

In WordCount01 and WordCount02, there were only one reduce output file generated. The reason is that in my cluster, the property of mapreduce: mapred.tasktracker.reduce.tasks.maximum was set to 1, which means there will be only 1 reduce tasks that a TaskTracker can run simultaneously. But if you specify the number of reduce tasks with org.apache.hadoop.conf.Configuration, the default configuration will be overwritten by the number you specified.

hadoop

安科网

Homework - Running Hadoop WordCount Examples

whspringer

whspringer

相关推荐

Hadoop3.2.0集群搭建常见注意事项

为什么Java仍将是未来的主流语言？

hadoop伪分布式环境搭建

_服役新节点，退役旧节点，多目录配置。+_HDFS2.x的新特性

Hadoop（一）安装

第四周练习

Hadoop小练习

hadoop框架三大组件hdfs、mapreduce、yarn 内容

Hadoop基础（三十三）：Zookeeper 分布式安装部署

Hadoop基础（二十二）：Shuffle机制（三）

hdfs、hive、hbase的搭建总结

NameNode和Zookeeper的format作用

hadoop集群的启动与停止

JStorm介绍

Hadoop2.7.7 centos7 完全分布式配置与问题随记

Hadoop Yarn工作机制 Job提交流程

【赵强老师】大数据工作流引擎Oozie

Hadoop

入门大数据---Spark开发环境搭建

hadoop创建目录

whspringer