install cluster Hadoop 安装集群版Hadoop

前期准备
1,每台主机均已上传并解压相关软件,并安装jdk 1.6版本以上,具体方法可以参照上篇文章。

2,配置ssh互通,本质就是把本机的.ssh/id_rsa.pub文件传输到本机和远程主机.ssh/authorized_keys中
 2.1 配置从master到其它主机无密码登录,理论上只设置此步骤即可
[Hadoop@linux1 ~]$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/home/hadoop/.ssh/id_rsa):
/home/hadoop/.ssh/id_rsa already exists.
Overwrite (y/n)? y
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/hadoop/.ssh/id_rsa.
Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub.
The key fingerprint is:
24:be:da:90:5e:e3:ff:be:d1:4a:ce:f0:3c:55:01:3b hadoop@linux1
[hadoop@linux1 ~]$ cd .ssh
[hadoop@linux1 .ssh]$ ls
authorized_keys  id_dsa  id_dsa.pub  id_rsa  id_rsa.pub  known_hosts
[hadoop@linux1 .ssh]$ cp  id_rsa.pub authorized_keys
[hadoop@linux1 .ssh]$ ssh linux1
The authenticity of host 'linux1 (172.16.251.11)' can't be established.
RSA key fingerprint is ed:1a:0b:46:f2:08:75:c6:e5:05:25:d0:7b:25:c6:61.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'linux1,172.16.251.11' (RSA) to the list of known hosts.
Last login: Mon Dec 17 09:21:37 2012 from dtydb6

  scp authorized_keys  linux2:/home/hadoop/.ssh/
 
通过以上配置,从linux1 ssh登录linux2、linux3就不再提示输入密码了
  2.2 配置从其它主机登录,以下命令分别在linux2和linux3执行
    2.2.1 从ssh-keygen -t rsa,生成id_rsa.pub文件
    2.2.2 scp id_rsa.pub linux1:/home/hadoop/.ssh/id_rsa.pub_linux2
    2.2.3 cat id_rsa.pub_linux2 >> authorized_keys
          cat id_rsa.pub_linux3 >> authorized_keys
    2.2.4 scp到其它所有主机
          scp authorized_keys  linux2:/home/hadoop/.ssh/
          scp authorized_keys  linux3:/home/hadoop/.ssh/
2.3 验证ssh互通是否配置完成
ssh linux2 date


3,安装hadoop
tar -zxvf hadoop-1.0.4.tar.gz
设置环境变量
export JAVA_HOME=/usr/java/jdk1.7.0_07
PATH=$PATH:$HOME/bin:/monitor/apache-flume-1.2.0/bin:/hadoop/hadoop-1.0.4/bin
默认参数设置在src/core/core-default.xml, src/hdfs/hdfs-default.xml and src/mapred/mapred-default.xml.等相关目录,个性化的配置在conf目录下的相关文件
3.1 conf/hadoop-env.sh 配置hadoop相关进程的运行参数
设置JAVA_HOME
export JAVA_HOME=/usr/java/jdk1.7.0_07
3.2 conf/core-site.xml 设置namenode的URI访问地址
[hadoop@linux1 hadoop-1.0.4]$ vi conf/core-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
    <property>
        <name>fs.default.name</name>
        <value>hdfs://linux1:9000</value>
    </property>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/home/hadoop/hadoop-1.0.4/var</value>
</property>
</configuration>

3.3 JobTracker相关配置信息
vi conf/mapred-site.xml

[hadoop@linux1 hadoop-1.0.4]$ vi conf/mapred-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
    <property>
        <name>mapred.job.tracker</name>
        <value>linux1:9001</value>
    </property>
</configuration>
3.4  hdfs配置信息,主要配置name和data的存放路径
[hadoop@linux1 hadoop-1.0.4]$ vi conf/hdfs-site.xml

<configuration>
    <property>
        <name>dfs.name.dir</name>
        <value>/home/hadoop/name1, /home/hadoop/name2</value>
        <description>  </description>
    </property>
    <property>
        <name>dfs.data.dir</name>
        <value>/home/hadoop/data1, /home/hadoop/data2</value>
        <description> </description>
    </property>
    <property>
        <name>dfs.replication</name>
        <value>2</value>
    </property>
</configuration>

3.4 配置master和slaves文件
[hadoop@linux1 conf]$ vi masters

linux1

[hadoop@linux1 conf]$ vi slaves

linux2
linux3

3.5 把配置好的配置文件,软件分发到其他主机
[hadoop@linux1 conf]$ scp * linux3:/home/hadoop/hadoop-1.0.4/conf
或者hadoop整体打包到其它主机

相关推荐