使用命令行编译打包运行自己的MapReduce程序 Hadoop2.4.1
网上的MapReduce WordCount教程对于如何编译WordCount.java几乎是一笔带过… 而有写到的,大多又是 0.20 等旧版本版本的做法,即 javac -classpath /usr/local/Hadoop/hadoop-1.0.1/hadoop-core-1.0.1.jar WordCount.java
,但较新的 2.X 版本中,已经没有 hadoop-core*.jar 这个文件,因此编辑和打包自己的MapReduce程序与旧版本有所不同。
本文以 Hadoop 2.4.1 环境下的WordCount实例来介绍 2.x 版本中如何编辑自己的MapReduce程序。
Hadoop 2.x 版本中的依赖 jar
Hadoop 2.x 版本中jar不再集中在一个 hadoop-core*.jar 中,而是分成多个 jar,如运行WordCount实例需要如下三个 jar:
- $HADOOP_HOME/share/hadoop/common/hadoop-common-2.4.1.jar
- $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.4.1.jar
- $HADOOP_HOME/share/hadoop/common/lib/commons-cli-1.2.jar
编译、打包 Hadoop MapReduce 程序
将上述 jar 添加至 classpath 路径:
export CLASSPATH="$HADOOP_HOME/share/hadoop/common/hadoop-common-2.4.1.jar:$HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.4.1.jar:$HADOOP_HOME/share/hadoop/common/lib/commons-cli-1.2.jar:$CLASSPATH"
接着就可以编译 WordCount.java 了(使用的是 2.4.1 源码中的 WordCount.java,源码在文本最后面):
javac WordCount.java
编译时会有警告,可以忽略。编译后可以看到生成了几个.class文件。
接着把 .class 文件打包成 jar,才能在 Hadoop 中运行:
jar -cvf WordCount.jar ./WordCount*.class
打包完成后,运行试试,创建几个输入文件:
Mkdir input echo "echo of the rainbow" > ./input/file0 echo "the waiting game" > ./input/file1
开始运行:
/usr/local/hadoop/bin/hadoop jar WordCount.jar WordCount input output
不过这边可能会遇到如下的提示 Exception in thread "main" java.lang.NoClassDefFoundError: WordCount
:
因为程序中声明了 package ,所以在命令中也要 org.apache.hadoop.examples
写完整:
/usr/local/hadoop/bin/hadoop jar WordCount.jar org.apache.hadoop.examples.WordCount input output
正确运行后的结果如下:
进阶:使用Eclipse编译运行MapReduce程序
使用命令行编译运行MapReduce程序毕竟有些麻烦,修改一次就得手动编译、打包一次,使用Eclipse编译运行MapReduce程序会更加方便。
WordCount.java 源码
文件位于 hadoop-2.4.1-src\hadoop-mapreduce-project\hadoop-mapreduce-examples\src\main\java\org\apache\hadoop\examples 中:
<span class="com">/**</span>
<span class="com">* Licensed to the Apache Software Foundation (ASF) under one</span>
<span class="com">* or more contributor license agreements. See the NOTICE file</span>
<span class="com">* distributed with this work for additional information</span>
<span class="com">* regarding copyright ownership. The ASF licenses this file</span>
<span class="com">* to you under the Apache License, Version 2.0 (the</span>
<span class="com">* "License"); you may not use this file except in compliance</span>
<span class="com">* with the License. You may obtain a copy of the License at</span>
<span class="com">*</span>
<span class="com">* http://www.apache.org/licenses/LICENSE-2.0</span>
<span class="com">*</span>
<span class="com">* Unless required by applicable law or agreed to in writing, software</span>
<span class="com">* distributed under the License is distributed on an "AS IS" BASIS,</span>
<span class="com">* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.</span>
<span class="com">* See the License for the specific language governing permissions and</span>
<span class="com">* limitations under the License.</span>
<span class="com">*/</span>
<span class="kwd">package</span><span class="pln"> org</span><span class="pun">.</span><span class="pln">apache</span><span class="pun">.</span><span class="pln">hadoop</span><span class="pun">.</span><span class="pln">examples</span><span class="pun">;</span>
<span class="kwd">import</span><span class="pln"> java</span><span class="pun">.</span><span class="pln">io</span><span class="pun">.</span><span class="typ">IOException</span><span class="pun">;</span>
<span class="kwd">import</span><span class="pln"> java</span><span class="pun">.</span><span class="pln">util</span><span class="pun">.</span><span class="typ">StringTokenizer</span><span class="pun">;</span>
<span class="kwd">import</span><span class="pln"> org</span><span class="pun">.</span><span class="pln">apache</span><span class="pun">.</span><span class="pln">hadoop</span><span class="pun">.</span><span class="pln">conf</span><span class="pun">.</span><span class="typ">Configuration</span><span class="pun">;</span>
<span class="kwd">import</span><span class="pln"> org</span><span class="pun">.</span><span class="pln">apache</span><span class="pun">.</span><span class="pln">hadoop</span><span class="pun">.</span><span class="pln">fs</span><span class="pun">.</span><span class="typ">Path</span><span class="pun">;</span>
<span class="kwd">import</span><span class="pln"> org</span><span class="pun">.</span><span class="pln">apache</span><span class="pun">.</span><span class="pln">hadoop</span><span class="pun">.</span><span class="pln">io</span><span class="pun">.</span><span class="typ">IntWritable</span><span class="pun">;</span>
<span class="kwd">import</span><span class="pln"> org</span><span class="pun">.</span><span class="pln">apache</span><span class="pun">.</span><span class="pln">hadoop</span><span class="pun">.</span><span class="pln">io</span><span class="pun">.</span><span class="typ">Text</span><span class="pun">;</span>
<span class="kwd">import</span><span class="pln"> org</span><span class="pun">.</span><span class="pln">apache</span><span class="pun">.</span><span class="pln">hadoop</span><span class="pun">.</span><span class="pln">mapreduce</span><span class="pun">.</span><span class="typ">Job</span><span class="pun">;</span>
<span class="kwd">import</span><span class="pln"> org</span><span class="pun">.</span><span class="pln">apache</span><span class="pun">.</span><span class="pln">hadoop</span><span class="pun">.</span><span class="pln">mapreduce</span><span class="pun">.</span><span class="typ">Mapper</span><span class="pun">;</span>
<span class="kwd">import</span><span class="pln"> org</span><span class="pun">.</span><span class="pln">apache</span><span class="pun">.</span><span class="pln">hadoop</span><span class="pun">.</span><span class="pln">mapreduce</span><span class="pun">.</span><span class="typ">Reducer</span><span class="pun">;</span>
<span class="kwd">import</span><span class="pln"> org</span><span class="pun">.</span><span class="pln">apache</span><span class="pun">.</span><span class="pln">hadoop</span><span class="pun">.</span><span class="pln">mapreduce</span><span class="pun">.</span><span class="pln">lib</span><span class="pun">.</span><span class="pln">input</span><span class="pun">.</span><span class="typ">FileInputFormat</span><span class="pun">;</span>
<span class="kwd">import</span><span class="pln"> org</span><span class="pun">.</span><span class="pln">apache</span><span class="pun">.</span><span class="pln">hadoop</span><span class="pun">.</span><span class="pln">mapreduce</span><span class="pun">.</span><span class="pln">lib</span><span class="pun">.</span><span class="pln">output</span><span class="pun">.</span><span class="typ">FileOutputFormat</span><span class="pun">;</span>
<span class="kwd">import</span><span class="pln"> org</span><span class="pun">.</span><span class="pln">apache</span><span class="pun">.</span><span class="pln">hadoop</span><span class="pun">.</span><span class="pln">util</span><span class="pun">.</span><span class="typ">GenericOptionsParser</span><span class="pun">;</span>
<span class="kwd">public</span><span class="kwd">class</span><span class="typ">WordCount</span><span class="pun">{</span>
<span class="kwd">public</span><span class="kwd">static</span><span class="kwd">class</span><span class="typ">TokenizerMapper</span>
<span class="kwd">extends</span><span class="typ">Mapper</span><span class="pun"><</span><span class="typ">Object</span><span class="pun">,</span><span class="typ">Text</span><span class="pun">,</span><span class="typ">Text</span><span class="pun">,</span><span class="typ">IntWritable</span><span class="pun">>{</span>
<span class="kwd">private</span><span class="kwd">final</span><span class="kwd">static</span><span class="typ">IntWritable</span><span class="pln"> one </span><span class="pun">=</span><span class="kwd">new</span><span class="typ">IntWritable</span><span class="pun">(</span><span class="lit">1</span><span class="pun">);</span>
<span class="kwd">private</span><span class="typ">Text</span><span class="pln"> word </span><span class="pun">=</span><span class="kwd">new</span><span class="typ">Text</span><span class="pun">();</span>
<span class="kwd">public</span><span class="kwd">void</span><span class="pln"> map</span><span class="pun">(</span><span class="typ">Object</span><span class="pln"> key</span><span class="pun">,</span><span class="typ">Text</span><span class="pln"> value</span><span class="pun">,</span><span class="typ">Context</span><span class="pln"> context</span>
<span class="pun">)</span><span class="kwd">throws</span><span class="typ">IOException</span><span class="pun">,</span><span class="typ">InterruptedException</span><span class="pun">{</span>
<span class="typ">StringTokenizer</span><span class="pln"> itr </span><span class="pun">=</span><span class="kwd">new</span><span class="typ">StringTokenizer</span><span class="pun">(</span><span class="pln">value</span><span class="pun">.</span><span class="pln">toString</span><span class="pun">());</span>
<span class="kwd">while</span><span class="pun">(</span><span class="pln">itr</span><span class="pun">.</span><span class="pln">hasMoreTokens</span><span class="pun">())</span><span class="pun">{</span>
<span class="pln">word</span><span class="pun">.</span><span class="pln">set</span><span class="pun">(</span><span class="pln">itr</span><span class="pun">.</span><span class="pln">nextToken</span><span class="pun">());</span>
<span class="pln">context</span><span class="pun">.</span><span class="pln">write</span><span class="pun">(</span><span class="pln">word</span><span class="pun">,</span><span class="pln"> one</span><span class="pun">);</span>
<span class="pun">}</span>
<span class="pun">}</span>
<span class="pun">}</span>
<span class="kwd">public</span><span class="kwd">static</span><span class="kwd">class</span><span class="typ">IntSumReducer</span>
<span class="kwd">extends</span><span class="typ">Reducer</span><span class="pun"><</span><span class="typ">Text</span><span class="pun">,</span><span class="typ">IntWritable</span><span class="pun">,</span><span class="typ">Text</span><span class="pun">,</span><span class="typ">IntWritable</span><span class="pun">></span><span class="pun">{</span>
<span class="kwd">private</span><span class="typ">IntWritable</span><span class="pln"> result </span><span class="pun">=</span><span class="kwd">new</span><span class="typ">IntWritable</span><span class="pun">();</span>
<span class="kwd">public</span><span class="kwd">void</span><span class="pln"> reduce</span><span class="pun">(</span><span class="typ">Text</span><span class="pln"> key</span><span class="pun">,</span><span class="typ">Iterable</span><span class="pun"><</span><span class="typ">IntWritable</span><span class="pun">></span><span class="pln"> values</span><span class="pun">,</span>
<span class="typ">Context</span><span class="pln"> context</span>
<span class="pun">)</span><span class="kwd">throws</span><span class="typ">IOException</span><span class="pun">,</span><span class="typ">InterruptedException</span><span class="pun">{</span>
<span class="kwd">int</span><span class="pln"> sum </span><span class="pun">=</span><span class="lit">0</span><span class="pun">;</span>
<span class="kwd">for</span><span class="pun">(</span><span class="typ">IntWritable</span><span class="pln"> val </span><span class="pun">:</span><span class="pln"> values</span><span class="pun">)</span><span class="pun">{</span>
<span class="pln">sum </span><span class="pun">+=</span><span class="pln"> val</span><span class="pun">.</span><span class="pln">get</span><span class="pun">();</span>
<span class="pun">}</span>
<span class="pln">result</span><span class="pun">.</span><span class="pln">set</span><span class="pun">(</span><span class="pln">sum</span><span class="pun">);</span>
<span class="pln">context</span><span class="pun">.</span><span class="pln">write</span><span class="pun">(</span><span class="pln">key</span><span class="pun">,</span><span class="pln"> result</span><span class="pun">);</span>
<span class="pun">}</span>
<span class="pun">}</span>
<span class="kwd">public</span><span class="kwd">static</span><span class="kwd">void</span><span class="pln"> main</span><span class="pun">(</span><span class="typ">String</span><span class="pun">[]</span><span class="pln"> args</span><span class="pun">)</span><span class="kwd">throws</span><span class="typ">Exception</span><span class="pun">{</span>
<span class="typ">Configuration</span><span class="pln"> conf </span><span class="pun">=</span><span class="kwd">new</span><span class="typ">Configuration</span><span class="pun">();</span>
<span class="typ">String</span><span class="pun">[]</span><span class="pln"> otherArgs </span><span class="pun">=</span><span class="kwd">new</span><span class="typ">GenericOptionsParser</span><span class="pun">(</span><span class="pln">conf</span><span class="pun">,</span><span class="pln"> args</span><span class="pun">).</span><span class="pln">getRemainingArgs</span><span class="pun">();</span>
<span class="kwd">if</span><span class="pun">(</span><span class="pln">otherArgs</span><span class="pun">.</span><span class="pln">length </span><span class="pun">!=</span><span class="lit">2</span><span class="pun">)</span><span class="pun">{</span>
<span class="typ">System</span><span class="pun">.</span><span class="pln">err</span><span class="pun">.</span><span class="pln">println</span><span class="pun">(</span><span class="str">"Usage: wordcount <in> <out>"</span><span class="pun">);</span>
<span class="typ">System</span><span class="pun">.</span><span class="pln">exit</span><span class="pun">(</span><span class="lit">2</span><span class="pun">);</span>
<span class="pun">}</span>
<span class="typ">Job</span><span class="pln"> job </span><span class="pun">=</span><span class="kwd">new</span><span class="typ">Job</span><span class="pun">(</span><span class="pln">conf</span><span class="pun">,</span><span class="str">"word count"</span><span class="pun">);</span>
<span class="pln">job</span><span class="pun">.</span><span class="pln">setJarByClass</span><span class="pun">(</span><span class="typ">WordCount</span><span class="pun">.</span><span class="kwd">class</span><span class="pun">);</span>
<span class="pln">job</span><span class="pun">.</span><span class="pln">setMapperClass</span><span class="pun">(</span><span class="typ">TokenizerMapper</span><span class="pun">.</span><span class="kwd">class</span><span class="pun">);</span>
<span class="pln">job</span><span class="pun">.</span><span class="pln">setCombinerClass</span><span class="pun">(</span><span class="typ">IntSumReducer</span><span class="pun">.</span><span class="kwd">class</span><span class="pun">);</span>
<span class="pln">job</span><span class="pun">.</span><span class="pln">setReducerClass</span><span class="pun">(</span><span class="typ">IntSumReducer</span><span class="pun">.</span><span class="kwd">class</span><span class="pun">);</span>
<span class="pln">job</span><span class="pun">.</span><span class="pln">setOutputKeyClass</span><span class="pun">(</span><span class="typ">Text</span><span class="pun">.</span><span class="kwd">class</span><span class="pun">);</span>
<span class="pln">job</span><span class="pun">.</span><span class="pln">setOutputValueClass</span><span class="pun">(</span><span class="typ">IntWritable</span><span class="pun">.</span><span class="kwd">class</span><span class="pun">);</span>
<span class="typ">FileInputFormat</span><span class="pun">.</span><span class="pln">addInputPath</span><span class="pun">(</span><span class="pln">job</span><span class="pun">,</span><span class="kwd">new</span><span class="typ">Path</span><span class="pun">(</span><span class="pln">otherArgs</span><span class="pun">[</span><span class="lit">0</span><span class="pun">]));</span>
<span class="typ">FileOutputFormat</span><span class="pun">.</span><span class="pln">setOutputPath</span><span class="pun">(</span><span class="pln">job</span><span class="pun">,</span><span class="kwd">new</span><span class="typ">Path</span><span class="pun">(</span><span class="pln">otherArgs</span><span class="pun">[</span><span class="lit">1</span><span class="pun">]));</span>
<span class="typ">System</span><span class="pun">.</span><span class="pln">exit</span><span class="pun">(</span><span class="pln">job</span><span class="pun">.</span><span class="pln">waitForCompletion</span><span class="pun">(</span><span class="kwd">true</span><span class="pun">)</span><span class="pun">?</span><span class="lit">0</span><span class="pun">:</span><span class="lit">1</span><span class="pun">);</span>
<span class="pun">}</span>
<span class="pun">}</span>