Hadoop HelloWorld Examples - 单表连接

forliberty

2013-08-27

应该是那本"Hadoop 实战"的第4个demo了，单表连接。给出一对对的children和parents的名字，然后输出所有的grandchildren和grandparents对。

相关阅读：

输入数据（第一列child，第二列 parent）

Tom Lucy
Tom Jack
Jone Lucy
Jone Jack
Lucy Mary
Lucy Ben
Jack Alice
Jack Jesse
Terry Alice
Terry Jesse
Philip Terry
Philip Alma
Mark Terry
Mark Alma

输出数据(第一列grandchild,第二列grandparents)

Tom Jesse
Tom Alice
Jone Jesse
Jone Alice
Jone Ben
Jone Mary
Tom Ben
Tom Mary
Philip Alice
Philip Jesse
Mark Alice
Mark Jesse

不知到是"Hadoop 实战" 还是那个虾皮工作室写的解法，把这个问题说成是单表联接。个人觉得其实没有这么深奥，说白了非常的简单，map输入：如上所说，key是child's name, value是parent's name； map的输出：key是名字，value是“标识+名字”，其中标识'0'后面接他的child's name，'1'接他的parent's name。比如第一个记录Tom Lucy，那么key是"Tom", value是"1Lucy"，还有key是lucy，value是"0Tom" 。 reducer的输入：对于每一个人名(key)，它的一系列values是他的children或者parents name(通过标识来识别是children还是parents)，这样子得到key的children集合和parents集合（说白了就是知道了key的所有children和parents的名字）。然后求这两个集合的乘积（或者称为笛卡尔积）即可。。

具体代码如下：

import java.util.*;
import java.io.*;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.TextInputFormat;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.mapreduce.lib.input.KeyValueTextInputFormat;

public class oneTableConnect {

public static class tableMapper extends Mapper<Text, Text, Text, Text>
{
@Override
public void map(Text key, Text value, Context context) throws IOException, InterruptedException
{
context.write(key, new Text( "1" + value.toString()));
context.write(value, new Text("0" + key.toString()));
}
}

public static class tableReducer extends Reducer<Text, Text, Text, Text>
{
@Override
public void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException
{
String[] gChildren = new String[10];
String[] gParents = new String[10];
int cNum = 0;
int pNum = 0;
for(Text val : values)
{
if(val.toString().charAt(0) == '0')// the key's child.
{
gChildren[cNum++] = val.toString().substring(1);
}
else//the key's parent.
{
gParents[pNum++] = val.toString().substring(1);
}
}

for(int i=0; i<cNum; i++)
for(int j=0;j<pNum;j++)
{
context.write(new Text(gChildren[i]), new Text(gParents[j]));
}
}
}

public static void main(String[] args) throws Exception
{
Configuration conf = new Configuration();
conf.set("mapreduce.input.keyvaluelinerecordreader.key.value.separator", " ");

Job job = new Job(conf, "tableConnect");
job.setJarByClass(oneTableConnect.class);

job.setMapperClass(tableMapper.class);
job.setReducerClass(tableReducer.class);

job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);

job.setInputFormatClass(KeyValueTextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);

FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));

System.exit( job.waitForCompletion(true) ? 0 : 1);
}
}

apache hadoop

安科网

Hadoop HelloWorld Examples - 单表连接

forliberty

forliberty

相关推荐

为什么Java仍将是未来的主流语言？

.NET Core下使用Kafka的方法步骤

解决PHPstudy Apache无法启动的问题【亲测有效】

Web安全：文件解析漏洞

终于有人把Nginx说清楚了，图文详解！

如何使用Apache Web服务器来安装和配置网站？

CentOS 8 Apache 安装后 SSL 重定向提示证书错误

如何使用 Apache Directory Studio 连接 JumpCloud

初学者和专业技术人员使用的十大机器学习软件

每个Java开发人员都应该知道的10大Github仓库

漫话：应用程序被拖慢？罪魁祸首竟然是Log4j！

JSP动态网页开发原理详解

centos8使用Apache httpd2.4.37安装web服务器的步骤详解

Tomcat启动springboot项目war包报错：启动子级时出错的问题

如何通过Apache在本地配置多个虚拟主机

Apache Shiro 反序列化(CVE-2016-4437)复现

Apache Shiro 反序列化(CVE-2016-4437)复现

Apache DolphinScheduler 诞生记

【Shiro】05 自定义Realm认证实现

Web容器Web服务器及常见的Web容器有哪些？

forliberty