Hadoop2.7.1 HA集群部署
1.修改文件/etc/hostname里的值即可,修改成功后用hostname命令查看当前主机名是否设置成功。
[root@masternode CentOS]# cat /etc/hosts
#127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
#::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
10.10.10.3 slavenode1.novalocal slavenode1
10.10.10.4 masternode.novalocal masternode
10.10.10.5 slavenode2.novalocal slavenode2
10.10.10.6 slavenode3.novalocal slavenode3
修改hostname
vi /etc/sysconfig/network
Hostname=masternode1
3.在主节点机器上设置ssh免密码登陆
1) 首先在主机器上核对ssh是否安装
[root@masternode ~]# rpm -qa |grep ssh
libssh2-1.4.2-1.el6.x86_64
openssh-5.3p1-104.el6_6.1.x86_64
openssh-server-5.3p1-104.el6_6.1.x86_64
openssh-clients-5.3p1-104.el6_6.1.x86_64
2) 生产密钥
[root@masternode ~]# cd .ssh/
[root@masternode .ssh]# ls
authorized_keys
[root@masternode .ssh]# cd /
[root@masternode /]# ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/opt/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /opt/.ssh/id_rsa.
Your public key has been saved in /opt/.ssh/id_rsa.pub.
The key fingerprint is:
e8:3d:75:11:0b:6a:a9:f5:39:e5:04:71:2e:94:21:94 [email protected]
The key's randomart image is:
+--[ RSA 2048]----+
| .o.*+o |
| E=.= o |
| = . * |
| = . * . |
| o S = o |
| . . . o |
| . o |
| . |
| |
+-----------------+
[root@masternode /]# cd
[root@masternode ~]# cd .ssh/
[root@masternode .ssh]# ls
authorized_keys id_rsa id_rsa.pub 生产的密钥
[root@masternode .ssh]# cat id_rsa.pub >> authorized_keys
3)把密钥传输到其他节点机器上
(1)用ssh-copy-id命令将公钥传送到远程主机上(这里以Slave1node3为例)。
[root@masternode ~]# ssh-copy-id root@slavenode3
(2)如果在用命令ssh-copy-id时发现找不到该命令“ssh-copy-id:Command not found”,则可能是ssh服务的版本太低的原因,比如若你的机器是RedHat系统就可能该问题,解决办法是:手动复制本地的pubkey内容到远程服务器,命令如下:
cat ~/.ssh/id_rsa.pub | ssh root@slavenode3 'cat >> ~/.ssh/authorized_keys'
该命令等价于下面两个命令:
①在本地机器上执行:scp ~/.ssh/id_rsa.pub root@slavenode3:/~
②到远程机器上执行:cat ~/id_rsa.pub >> ~/.ssh/authorized_keys
[root@masternode .ssh]# scp authorized_keys [email protected]:/opt/.ssh
The authenticity of host '125.208.30.89(125.208.30.89)' can't be established.
RSA key fingerprint is e3:97:c0:29:e4:fa:0d:41:31:6e:df:fe:0c:6b:c7:08.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '125.208.30.89' (RSA) to the list of known hosts.
Permission denied (publickey,gssapi-keyex,gssapi-with-mic).
lost connection
[root@masternode .ssh]# vi authorized_keys
cat id_rsa.pub >> authorized_keysd_rsa.pub
[root@masternode .ssh]# scp .ssh/authorized_keys root@mslavenode1:~/.ssh/
[root@masternode .ssh]# scp .ssh/authorized_keys root@slavenode2:~/.ssh/
[root@masternode .ssh]# scp .ssh/authorized_keys root@slavenode3:~/.ssh/
4.把主机hosts拷贝到其他从机器上
[root@masternode centos]# scp /etc/hosts [email protected]:/etc/hosts
hosts 100% 332 0.3KB/s 00:00
[root@masternode centos]# scp /etc/hosts [email protected]:/etc/hosts
hosts 100% 332 0.3KB/s 00:00
[root@masternode centos]# scp /etc/hosts [email protected]:/etc/hosts
5.安装java软件
1)解压java软件包
cd /usr/java/
tar -xvf jdk-7u79-linux-x64.tar.gz
chown -R Hadoop:hadoop jdk-7u79
2)编辑环境变量
编辑"/etc/profile"文件,在后面添加Java的"JAVA_HOME"、"CLASSPATH"以及"PATH"内容如下:
vi /etc/profile
export JAVA_HOME=/usr/java/jdk1.7.0_79
export CLASSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
#export JAVA_LIBRARY_PATH='/opt/hadoop/hadoop-2.7.2/lib/native'
export PATH=$PATH:$JAVA_HOME/bin
#set hadoop path
export HADOOP_HOME=/opt/hadoop/hadoop-2.7.2
#export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
#export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib:$HADOOP_COMMON_LIB_NATIVE_DIR"
export PATH=$PATH:$HADOOP_HOME/bin
#set hive
export HIVE_HOME=/opt/hadoop/hive-2.0.0
export PATH=$PATH:$HIVE_HOME/bin
#set zookeeper
export ZOOKEEPER_HOME=/opt/hadoop/zookeeper
export PATH=$PATH:$ZOOKEEPER_HOME/bin:$ZOOKEEPER_HOME/conf
#set hbase
export HBASE_HOME=/opt/hadoop/hbase-1.1.5
export PATH=$PATH:$HBASE_HOME/bin
#set scala
export SCALA_HOME=/opt/hadoop/scala
export PATH=$PATH:$SCALA_HOME/bin
#set spark
export SPARK_HOME=/opt/hadoop/spark
export PATH=$PATH:$SPARK_HOME/bin
3)使配置生效
保存并退出,执行下面命令使其配置立即生效。
source /etc/profile 或 . /etc/profile
vi ~/.bash_profile
export JAVA_HOME=/usr/java/jdk1.7.0_79/
export CLASSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export PATH=$PATH:$JAVA_HOME/bin
#set hadoop path
export HADOOP_HOME=/opt/hadoop/hadoop-2.7.2
export PATH=$PATH:$HADOOP_HOME/bin
Source ~/.bash_profile
4)验证安装成功
配置完毕并生效后,用下面命令判断是否成功。
[root@masternode java]# java -version
java version "1.7.0_79"
Java(TM) SE Runtime Environment (build 1.7.0_79-b15)
Java HotSpot(TM) 64-Bit Server VM (build 24.79-b02, mixed mode)
5)安装剩余机器java
通过scp命令格式把"/usr/java/"文件复制到其他Slave上面,剩下的事儿就是在其余的Slave服务器上按照上面步骤配置环境变量和测试是否安装成功,这里以Slavenode2为例:
[root@masternode java]# scp -r jdk1.7.0_79/ [email protected]:/usr/java/
[root@masternode java]# scp -r jdk1.7.0_79/ [email protected]:/usr/java/
[root@masternode java]# scp -r jdk1.7.0_79/ [email protected]:/usr/java/
此时不可以选择较低版本的JDK进行安装,因为所有集群中的JDK版本必须相同
Hadoop集群之zookeeper安装配置-hbase安装就不用安装这个模块
1.下���地址:
http://mirror.bjtu.edu.cn/apache/zookeeper/zookeeper-3.4.8/zookeeper-3.4.8.tar.gz
2.zookeepe 安装
① 上传软件并解压
把zookeeper-3.4.8.tar.gz文件存放在/opt/hadoop目录下,进行解压:
[root@masternode hadoop]# tar -zxvf zookeeper-3.4.8.tar.gz
mv zookeeper-3.4.8 zookeeper
---要是单独有自己的用户名则执行下面命令
hadoop@Ubuntu:~$ sudo tar -zxvf zookeeper-3.4.8.tar.gz
hadoop@ubuntu:~$ chown -R hadoop:hadoop zookeeper-3.4.8
mv zookeeper-3.4.8 zookeeper
② 设置环境变量 vi /etc/profile
#set zookeeper
export ZOOKEEPER_HOME=/opt/hadoop/zookeeper
export PATH=$PATH:$ZOOKEEPER_HOME/bin
Vi ~/.bash_profile
#set zookeeper
export ZOOKEEPER_HOME=/opt/hadoop/zookeeper
export PATH=$PATH:$ZOOKEEPER_HOME/bin
[root@masternode hadoop]# source /etc/profile
[root@masternode hadoop]# cd zookeeper/conf/
[root@masternode conf]# pwd
/opt/hadoop/zookeeper/conf
③ 配置zoo.cfg文件
配置文件存放在$ZOOKEEPER_HOME/conf/目录下,将zoo_sample.cfd文件名称改为zoo.cfg, 缺省的配置内容如下:
[root@masternode conf]# mv zoo_sample.cfg zoo.cfg
vi zoo.cfg
clientPort=2181
dataDir=/opt/hadoop/zookeeper/data/
dataLogDir=/opt/hadoop/zookeeper/log/
server.1=slavenode1:2888:3888
server.2=masternode:2888:3888
server.3=slavenode2:2888:3888
server.4=slavenode3:2888:3888
配置说明:
tickTime:这个时间是作为 Zookeeper 服务器之间或客户端与服务器之间维持心跳的时间间隔,也就是每个 tickTime 时间就会发送一个心跳。
dataDir:顾名思义就是 Zookeeper 保存数据的目录,默认情况下,Zookeeper 将写数据的日志文件也保存在这个目录里。
clientPort:这个端口就是客户端连接 Zookeeper 服务器的端口,Zookeeper 会监听这个端口,接受客户端的访问请求。
④ 新建两个目录
mkdir /opt/hadoop/zookeeper/data/
mkdir /opt/hadoop/zookeeper/log/
在 /opt/hadoop/zookeeper/data/目录下创建一个文件:myid
touch /opt/hadoop/zookeeper/data/myid
[root@masternode data]# echo ‘1’> myid
1
⑤ 将zookeeper目录 和环境变量设置文件拷贝到其他机器上
[root@masternode hadoop]# scp -r zookeeper root@slavenode1:/opt/hadoop/
[root@masternode hadoop]# scp -r zookeeper root@slavenode3:/opt/hadoop/
[root@masternode hadoop]# scp -r zookeeper root@slavenode2:/opt/hadoop/
[root@masternode conf]# scp ~/.bash_profile root@slavenode2:~/.bash_profile
[root@masternode conf]# scp ~/.bash_profile root@slavenode3:~/.bash_profile
[root@masternode conf]# scp ~/.bash_profile root@slavenode1:~/.bash_profile
⑥ 修改其他机器该文件
[root@masternode centos]# cat /opt/hadoop/zookeeper/data/myid
2
[root@slavenode3 centos]# cat /opt/hadoop/zookeeper/data/myid
4
[root@slavenode3 centos]# cat /opt/hadoop/zookeeper/data/myid
3
⑦ 启动zookeeper (所配的机器都要启动)
cd /opt/hadoop/zookeeper/bin
./zkServer.sh start
⑧ 验证
[root@masternode zookeeper]# jps
16159 HQuorumPeer
14438 DataNode
16397 Jps
14549 NodeManager
查看状态
[root@slavenode3 centos]# /opt/hadoop/zookeeper/bin/zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /opt/hadoop/zookeeper/bin/../conf/zoo.cfg
Mode: leader
客户端链接zookeeper
[root@slavenode3 centos]# /opt/hadoop/zookeeper/bin/zkCli.sh -server slavenode3:2181
WATCHER::
WatchedEvent state:SyncConnected type:None path:null
[zk: slavenode3:2181(CONNECTED) 0] ls /
[hbase, zookeeper]
[zk: slavenode3:2181(CONNECTED) 1] ls /hbase
[meta-region-server, backup-masters, region-in-transition, table, draining, table-lock, running, master, namespace, hbaseid, online-snapshot, replication, splitWAL, recovering-regions, rs, flush-table-proc]
安装hadoop
1.安装hadoop软件包
① 解压hadoop软件包
Cd /opt/hadoop/
tar -xvf hadoop-2.7.2.tar.gz
[root@masternode hadoop]# ls
hadoop-2.7.2 jdk1.7.0_79 lost+found
hadoop-2.7.2.tar.gz jdk-7u79-linux-x64.tar.gz
[root@masternode hadoop]# cd hadoop-2.7.2
[root@masternode hadoop-2.7.2]# ls
bin include libexec NOTICE.txt sbin
etc lib LICENSE.txt README.txt share
② 设置环境变量
并把Hadoop的安装路径添加到"/etc/profile"中,修改"/etc/profile"文件,将以下语句添加到末尾,并使其生效(. /etc/profile):
#set hadoop path
export HADOOP_HOME=/opt/hadoop/hadoop-2.7.2
export PATH=$PATH:$HADOOP_HOME/bin
[root@masternode conf]# hadoop version
Hadoop 2.7.2
Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r b165c4fe8a74265c792ce23f546c64604acf0e41
Compiled by jenkins on 2016-01-26T00:08Z
Compiled with protoc 2.5.0
From source with checksum d0fda26633fa762bff87ec759ebe689c
This command was run using /opt/hadoop/hadoop-2.7.2/share/hadoop/common/hadoop-common-2.7.2.jar
③ 创建临时目录
[root@masternode hadoop-2.7.2]# mkdir -p /opt/hadoop/tmp
[root@masternode hadoop-2.7.2]# mkdir -p /opt/hadoop/hdfs/data
[root@masternode hadoop-2.7.2]# mkdir -p /opt/hadoop/hdfs/name
mkdir /opt/hadoop/hadoop-2.7.2/pids
mkdir /opt/hadoop/tmp/journal
chmod 755 /opt/hadoop/hdfs/data
chmod 755 /opt/hadoop/hdfs/name
chmod 755 /opt/hadoop/tmp
chmod 755 /opt/hadoop/tmp/journal
④ 配置hadoop
修改配置文件
需要修改的配置文件主要有下面几个:
./core-site.xml
./hdfs-site.xml
./mapred-site.xml(开始的安装包中是没有这个文件的,需要将文
件./mapred-site.xml.template复制并且重名为mapred-site.xml)。
./yarn-site.xml
./yarn-env.sh
./hadoop-env.sh
./slave
[root@masternode hadoop-2.7.2]# pwd
/opt/hadoop/hadoop-2.7.2
[root@masternode hadoop-2.7.2]# ls
bin include libexec NOTICE.txt sbin
etc lib LICENSE.txt README.txt share
[root@masternode hadoop-2.7.2]# cd etc/hadoop/
/opt/hadoop/hadoop-2.7.2/etc/hadoop
(1)配置hadoop-env.sh
配置 hadoop-env.sh文件-->修改JAVA_HOME
export JAVA_HOME=/usr/java/jdk1.7.0_79/
export HADOOP_PID_DIR=$HADOOP_HOME/pids
export HADOOP_HOME=/opt/hadoop/hadoop-2.7.2
export HADOOP_PID_DIR=$HADOOP_HOME/pids
export PATH=$PATH:$HADOOP_HOME/bin
(2)配置yarn-env.sh文件
配置 yarn-env.sh 文件-->>修改JAVA_HOME
# some Java parameters
export JAVA_HOME=/usr/java/jdk1.7.0_79/
export YARN_PID_DIR=/opt/hadoop/hadoop-2.7.2/pids
(3)配置slaves
配置slaves文件-->>增加slave节点
slavenode2
slavenode3
(4)配置 core-site.xml文件
<configuration>
<property>
<name>fs.trash.interval</name>
<value>1440</value>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://cluster-ha</value>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/opt/hadoop/tmp/journal</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
<!-- 指定hadoop临时目录 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/hadoop/tmp</value>
</property>
<!-- 指定zookeeper地址 -->
<property>
<name>ha.zookeeper.quorum</name>
<value>slavenode1:2181,slavenode3:2181,slavenode2:2181</value>
</property>
<property>
<name>ha.zookeeper.session-timeout.ms</name>
<value>300000</value>
</property>
</configuration>
(5)配置hdfs-site.xml 文件-->>增加hdfs配置信息(namenode、datanode端口和目录位置)
<configuration>
<property>
<name>dfs.nameservices</name>
<value>cluster-ha</value>
</property>
<property>
<name>dfs.ha.namenodes.cluster-ha</name>
<value>nn,snn</value>
</property>
<property>
<name>dfs.namenode.rpc-address.cluster-ha.nn</name>
<value>slavenode1:9000</value>
</property>
<property>
<name>dfs.namenode.http-address.cluster-ha.nn</name>
<value>slavenode1:50070</value>
</property>
<property>
<name>dfs.namenode.rpc-address.cluster-ha.snn</name>
<value>masternode:9000</value>
</property>
<property>
<name>dfs.namenode.http-address.cluster-ha.snn</name>
<value>masternode:50070</value>
</property>
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://slavenode1:8485;slavenode2:8485;slavenode3:8485;masternode:8485/cluster-ha</value>
</property>
<property>
<name>dfs.client.failover.proxy.provider.cluster-ha</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/opt/hadoop/tmp/journal</value>
</property>
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/opt/hadoop/hdfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/opt/hadoop/hdfs/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>slavenode1:2181,slavenode3:2181,slavenode2:2181</value>
</property>
<property>
<name>dfs.namenode.handler.count</name>
<value>600</value>
<description>The number of server threads for the namenode.</description>
</property>
<property>
<name>dfs.datanode.handler.count</name>
<value>600</value>
<description>The number of server threads for the datanode.</description>
</property>
<property>
<name>dfs.client.socket-timeout</name>
<value>600000</value>
</property>
<property>
<name>dfs.datanode.max.transfer.threads</name>
<value>409600</value>
</property>
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence
shell(/bin/true)</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/home/hadoop/.ssh/id_rsa</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.connect-timeout</name>
<value>30000</value>
</property>
</configuration>
(6)配置 mapred-site.xml 文件
-->>增加mapreduce配置(使用yarn框架、jobhistory使用地址以及web地址)
[root@masternode hadoop]# mv mapred-site.xml.template mapred-site.xml
vi mapred-site.xml
<configuration>
<!-- 配置MapReduce运行于yarn中 -->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.job.maps</name>
<value>12</value>
</property>
<property>
<name>mapreduce.job.reduces</name>
<value>12</value>
</property>
</configuration>
(7)配置yarn-site.xml 文件-->>增加yarn功能
<configuration>
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>259200</value>
</property>
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>slavenode1:2181,slavenode3:2181,slavenode2:2181</value>
</property>
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>cluster-yarn</value>
</property>
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>slavenode1</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>masternode</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address.rm1</name>
<value>slavenode1:8030</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address.rm2</name>
<value>masternode:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address.rm1</name>
<value>slavenode1:8031</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address.rm2</name>
<value>masternode:8031</value>
</property>
<property>
<name>yarn.resourcemanager.address.rm1</name>
<value>slavenode1:8032</value>
</property>
<property>
<name>yarn.resourcemanager.address.rm2</name>
<value>masternode:8032</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address.rm1</name>
<value>slavenode:8033</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address.rm2</name>
<value>masternode:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm1</name>
<value>slavenode1:8088</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm2</name>
<value>masternode:8088</value>
</property>
<property>
<name>yarn.resourcemanager.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.ha.automatic-failover.embedded</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.ha.automatic-failover.zk-base-path</name>
<value>/yarn-leader-election</value>
</property>
<property>
<name>yarn.resourcemanager.recovery.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.store.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>
[root@slavenode1 hadoop-2.7.2]# vi etc/hadoop/log4j.properties
log4j.logger.org.apache.hadoop.util.NativeCodeLoader=ERROR
为了解决启动进程会出现报错
现在在Master机器上的Hadoop配置就结束了,剩下的就是配置Slave机器上的Hadoop。
最简单的方法是将 Master上配置好的hadoop所在文件夹"/opt/hadoop/hadoop-2.7.2"复制到所有的Slave的"/opt/hadoop"目录下(实际上Slave机器上的slavers文件是不必要的, 复制了也没问题)。用下面命令格式进行。(备注:此时用户可以为普通用户也可以为root)
7.将配置好的hadoop文件copy到另一台slave机器上
[root@masternode hadoop]# scp -r hadoop-2.7.2/ [email protected]:/opt/hadoop/
[root@masternode hadoop]# scp -r hadoop-2.7.2/ [email protected]:/opt/hadoop/
[root@masternode hadoop]# scp -r hadoop-2.7.2/ [email protected]:/opt/hadoop/
接着在"Slave1 .Hadoop"上修改"/etc/profile"文件,将以下语句添加到末尾,并使其有效(source /etc/profile):
export HADOOP_HOME=/opt/hadoop/hadoop-2.7.2
export PATH=$PATH:$HADOOP_HOME/bin
保存并退出,执行下面命令使其配置立即生效���
source /etc/profile 或 . /etc/profile
8.启动及验证
1. 格式化namenode:
[root@masternode hadoop]# pwd
/opt/hadoop
[root@masternode hadoop]# cd hadoop-2.7.2/bin
hdfs zkfc –formatZK--格式化zookeeper custer-ha主目录
2. 启动启动namenode日志同步服务journalnode(四台机器)
/home/hadoop/hadoop-2.7.2/sbin/hadoop-daemon.sh start journalnode
3. 启动namenode、同步备用namenode、启动备用namenode
① 对NameNode(masternode)节点进行格式化
# hadoop namenode -format
注意:格式化第二次有可能会造成DataNode无法启动,原因是NameSpaceID不一致造成,解决方法是找出不一致的VERSION修改NameSpaceID,也可以尝试删除hdfs/data目录。
② 启动masternode(active)节点NameNode
d /opt/hadoop/hadoop-2.7.2/sbin
# hadoop-daemon.sh start namenode
③ Slavenode1节点上同步(masternode)元数据
# hdfs namenode -bootstrapStandby
④ 在masternode格式化ZKFC
# hdfs zkfc -formatZK
⑤ 在masternode节点启动HDFS集群
# start-dfs.sh
⑥ 在Master0节点启动YARN集群
# start-yarn.sh
⑦ 在Master1(slavenode1)节点启动RM
# yarn-daemon.sh start resourcemanager
⑧ 以后也可以通过start-all.sh与stop-all.sh启停Hadoop集群
[root@masternode hadoop-2.7.2]# ./bin/hdfs namenode -format
4. 启动hadoop
root@masternode centos]# cd /opt/hadoop/hadoop-2.7.2
[root@masternode hadoop-2.7.2]# sbin/start-all.sh
[root@slavenode1 hadoop-2.7.2]# sbin/stop-all.sh
This script is Deprecated. Instead use stop-dfs.sh and stop-yarn.sh
16/09/19 01:40:10 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 有警告
Stopping namenodes on [slavenode1]
slavenode1: stopping namenode
slavenode3: stopping datanode
slavenode2: stopping datanode
masternode: stopping datanode
Stopping secondary namenodes [slavenode1]
slavenode1: stopping secondarynamenode
16/09/19 01:40:29 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
stopping yarn daemons
stopping resourcemanager
slavenode2: stopping nodemanager
masternode: stopping nodemanager
slavenode3: stopping nodemanager
no proxyserver to stop
[root@slavenode1 hadoop-2.7.2]# ldd lib/native/libhadoop.so.1.0.0
lib/native/libhadoop.so.1.0.0: /lib64/libc.so.6: version `GLIBC_2.14' not found (required by lib/native/libhadoop.so.1.0.0)
linux-vdso.so.1 => (0x00007fff63bff000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007fc1127c5000)
libc.so.6 => /lib64/libc.so.6 (0x00007fc112430000)
/lib64/ld-linux-x86-64.so.2 (0x00007fc112bef000)
[root@slavenode1 hadoop-2.7.2]# ldd --version
ldd (GNU libc) 2.12
Copyright (C) 2010 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Written by Roland McGrath and Ulrich Drepper.
原来系统预装的glibc库是2.12版本,而hadoop期望是2.14版本,所以打印警告信息。
现在有两个办法,重新编译glibc.2.14版本,安装后专门给hadoop使用,这个有点危险。
第二个办法直接在log4j日志中去除告警信息。
[root@slavenode1 hadoop-2.7.2]# vi etc/hadoop/log4j.properties
log4j.logger.org.apache.hadoop.util.NativeCodeLoader=ERROR
9.验证hadoop
(1)验证方法一:用"jps"命令
在Master上用 java自带的小工具jps查看进程。
[[email protected]]# jps
19495 Jps
18849 NameNode
19051 SecondaryNameNode
19228 ResourceManager
[root@masternode centos]# jps
11664 Jps
11418 DataNode
11519 NodeManager
2)必须要关闭防火墙
[root@masternode hadoop-2.7.2]# service iptables stop
http://125.208.30.88:50070/dfshealth.html#tab-datanode
3)禁用selinux
编辑 "/etc/selinux/config"文件,设置"SELINUX=disabled"
或者setenforce 0
10.网页查看集群
(1)访问"http://10.10.10.3.88:50070" 或者http://10.10.10.4:50070
Hadoop安装完成之后,会有两个web管理界面,可以分别通过下面的url查看。 在浏览器中输入http://10.10.10.4:8088/,网址为master的ip:
在浏览器中输入:http://10.10.10.4:50070/,网址为master结点所对应的IP:
http://hadoop.apache.org/docs/r2.7.3/hadoop-project-dist/hadoop-common/ClusterSetup.html
11.HDFS的使用
11.1 帮助
如果配置好了环境变量,直接运行hdfs命令,不加任何参数就能得到hdfs的帮助。也可以用hdfs –help得到HDFS的帮助。
[root@masternode hadoop]# hdfs -help
Usage: hdfs [--config confdir] [--loglevel loglevel] COMMAND
where COMMAND is one of:
dfs run a filesystem command on the file systems supported in Hadoop.
classpath prints the classpath
从上面可以看到,dfs命令是执行一个HDFS文件系统支持的命令。可以进一步查看dfs的帮助。下面列出的是hdfs的dfs命令支持的一些文件系统操作,实际上,具体的各个操作的格式可以也从下面看到具体的说明。
[root@masternode hadoop]# hdfs dfs -help
hadoop dfsadmin -report #查看DataNode节点信息,可以使用这个命令脚本监控DFS状况
# hadoop fs -ls hdfs://hcluster:9000/ #指定HDFS地址访问
# hadoop fs -ls / #列出HDFS文件系统目录下文件和目录
# hadoop fs -lsr / #递归列出目录
# hadoop fs -mkdir /test #创建test目录
# hadoop fs -put /root/test.txt /test/test.txt #上传文件到test目录
# hadoop fs -cat /test/test.txt #查看文件内容
# hadoop fs -du /test/test.txt #查看文件大小
# hadoop fs -rm /test/test.txt #删除文件
# hadoop fs -rmr /test #递归删除目录或文件
下面关于Hadoop的文章您也可能喜欢,不妨看看: