hadoop 实战


在装Hadoop之前首先需要: 
1.java1.6.x 最好是sun的,1.5.x也可以 
2.ssh 
安装ssh 

$ sudo apt-get install ssh 
$ sudo apt-get install rsync 


下载Hadoop 
从http://hadoop.apache.org/core/releases.html 下载最近发布的版本 

最好为hadoop创建一个用户: 
比如创建一个group为hadoop user为hadoop的用户以及组 

$ sudo addgroup hadoop 
$ sudo adduser --ingroup hadoop hadoop 

解压下载的hadoop文件,放到/home/hadoop目录下 名字为hadoop 
配置JAVA_HOME: 

gedit ~/hadoop/conf/hadoop-env.sh 

将 

Java代码 
# The java implementation to use.  Required. 
# export JAVA_HOME=/usr/lib/j2sdk1.5-sun 

修改成java的安装目录:(我的是:/usr/lib/jvm/java-6-sun-1.6.0.15) 

# The java implementation to use. Required. 
export JAVA_HOME=/usr/lib/jvm/java-6-sun-1.6.0.15 


现在可以使用单节点的方式运行: 

$ cd hadoop 
$ mkdir input 
$ cp conf/*.xml input 
$ bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+' 
$ cat output/* 

Pseudo-distributed方式跑: 

配置ssh 

$ su - hadoop 
$ ssh-keygen -t rsa -P "" 
Generating public/private rsa key pair. 
Enter file in which to save the key (/home/hadoop/.ssh/id_rsa): 
Created directory '/home/hadoop/.ssh'. 
Your identification has been saved in /home/hadoop/.ssh/id_rsa. 
Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub. 
The key fingerprint is: 
9d:47:ab:d7:22:54:f0:f9:b9:3b:64:93:12:75:81:27 hadoop@ubuntu 


让其不输入密码就能登录: 

hadoop@ubuntu:~$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys 

  使用: 

$ ssh localhost 

看看是不是直接ok了。 


hadoop配置文件: 
conf/core-site.xml 

Java代码 
<?xml version="1.0"?> 
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?> 

<!-- Put site-specific property overrides in this file. --> 

<configuration> 
   <property> 
    <name>hadoop.tmp.dir</name> 
        <value>/home/hadoop/hadoop-datastore/hadoop-${user.name}</value> 
   </property> 
   <property> 
    <name>fs.default.name</name> 
    <value>hdfs://localhost:9000</value> 
   </property> 
</configuration> 

hadoop.tmp.dir配置为你想要的路径,${user.name}会自动扩展为运行hadoop的用户名 

conf/hdfs-site.xml 

Xml代码 
<configuration> 
  <property> 
    <name>dfs.replication</name> 
    <value>1</value> 
  </property> 
</configuration> 

dfs.replication为默认block复制数量 
conf/mapred-site.xml 

Xml代码 
<configuration> 
  <property> 
    <name>mapred.job.tracker</name> 
    <value>localhost:9001</value> 
  </property> 
</configuration> 

执行 

格式化分布式文件系统: 

$ bin/hadoop namenode -format 

启动hadoop: 

Java代码 
$ bin/start-all.sh 

可以从 

NameNode - http://localhost:50070/ 
JobTracker - http://localhost:50030/ 

查看NameNode和JobTracker 

运行例子: 


$ bin/hadoop fs -put conf input 
$ bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+' 

look at the run result: 
$ bin/hadoop fs -get output output 
$ cat output/* 


大家可参考: 1、http://hadoop.apache.org/common/docs/current/quickstart.html 
2、http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Single-Node_Cluster%29

相关推荐