hadoop2 搭建自动切换的ha集群 yarn集群
hdfs yarn 启动成功后访问的地址如下:
http://h2master:8088/cluster yarn
0) 自动切换流程简介:
FailoverController Active
FailoverController Standby
分别作为zk集群在namenode集群的代理
代理人感知到namenode集群出现了问题后,
zk集群会通过代理人FailoverController
将amenode集群中standby状态改为active
自动切换ha集群图如下:
FailoverController服务是zk的代理 ,内嵌在namenode服务中,接收zk的监督,
当某个namenode挂掉后,zk会通过shell脚本启动namenode standby节点变成active节点。
1) 各节点角色:
namenode:h2master h2master2
datanode:h2sliver112 h2sliver113 h2sliver114
journalnode:h2master h2master2 h2sliver112
zookeeper: h2master h2master2 h2sliver112
2)hadoop2 搭建手工切换ha的hdfs集群基础上:
a) 关闭所有启动角色
b) 删除所有机器/usr/local/hadoop2.5/tmp 和 /usr/local/hadoop2.5/logs的内容
3) zookeeper集群安装:
a) h2master上安装zookeeper a.1)把conf/zoo_sample.cfg重命名为conf/zoo.cfg mv zoo_sample.cfg zoo.cfg a.2)修改文件conf/zoo.cfg 1)dataDir=/usr/local/zookeeper/data 2)增加以下内容 server.1=h2master:2888:3888 ---> 标识1 是h2master对应在zookeeper集群的编号 2888:3888是数据通信端口 server.2=h2master2:2888:3888 ---> 标识2 是h2master2对应在zookeeper集群的编号 2888:3888是数据通信端口 server.3=h2sliver112:2888:3888 ---> 标识3 是h2sliver112对应在zookeeper集群的编号 2888:3888是数据通信端口 a.3) mkdir zookeeper/data [root@h2master zookeeper]# mkdir data a.4) 写入文件echo 1到 zookeeper/data/myid 这样在本机h2master内写上标识1 两者关联起来 [root@h2master zookeeper]# echo 1 > data/myid b) 复制zookeeper文件夹到h2master2、h2sliver112上 scp -rq zookeeper h2master2:/usr/local scp -rq zookeeper h2sliver112:/usr/local c) 其余节点写标识 在h2master2上执行命令echo 2 zookeeper/data/myid 在h2sliver112上执行命令echo 3 zookeeper/data/myid d) 启动和验证 在h2master、h2master2、h2sliver112上,分别执行命令zookeeper/bin/zkServer.sh start 执行命令zookeeper/bin/zkServer.sh status 可以看到三个节点的状态 哪个是leader 哪个是follower
4) hdfs配置文件:(hadoop-env.sh、core-site.xml、hdfs-site.xml、slaves)
2.1 配置文件(hadoop-env.sh、core-site.xml、hdfs-site.xml、slaves) 2.1.1 hadoop-env.sh export JAVA_HOME=/usr/local/jdk1.7 2.1.2 core-site.xml <property> <name>fs.defaultFS</name> <value>hdfs://cluster1</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/usr/local/hadoop2.5/tmp</value> </property> <property> <name>ha.zookeeper.quorum</name> ------ 配置zk集群 <value>h2master:2181,h2master2:2181,h2sliver112:2181</value> </property> 2.1.3 hdfs-site.xml <property> <name>dfs.replication</name> <value>3</value> </property> <property> <name>dfs.nameservices</name> ----每个nameservice对应一个hdfs集群 <value>cluster1</value> </property> <property> <name>dfs.ha.namenodes.cluster1</name> <value>h2master,h2master2</value> </property> <property> <name>dfs.namenode.rpc-address.cluster1.h2master</name> <value>h2master:9000</value> </property> <property> <name>dfs.namenode.http-address.cluster1.h2master</name> <value>h2master:50070</value> </property> <property> <name>dfs.namenode.rpc-address.cluster1.h2master2</name> <value>h2master2:9000</value> </property> <property> <name>dfs.namenode.http-address.cluster1.h2master2</name> <value>h2master2:50070</value> </property> <property> <name>dfs.ha.automatic-failover.enabled.cluster1</name> <value>true</value> </property> <property> <name>dfs.namenode.shared.edits.dir</name> <value>qjournal://h2master:8485;h2master2:8485;h2sliver112:8485/cluster1</value> </property> <property> <name>dfs.journalnode.edits.dir</name> <value>/usr/local/hadoop2.5/tmp/journal</value> </property> <property> <name>dfs.ha.fencing.methods</name> <value>sshfence</value> </property> <property> <name>dfs.ha.fencing.ssh.private-key-files</name> <value>/root/.ssh/id_rsa</value> </property> <property> <name>dfs.client.failover.proxy.provider.cluster1</name> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> </property> 2.1.6 slaves -----> 指定NodeManager 和 datanode h2sliver112 h2sliver113 h2sliver114
5) 删除其他节点的hadoop文件夹,修改好后拷贝到其余节点
scp -r /usr/local/hadoop2.5 h2master2:/usr/local/
6) 启动hadoop2 hdfs集群
6.1) 格式化zk集群 在h2master上执行hadoop2.5/bin/hdfs zkfc -formatZK 此操作仅仅表示和zk集群发生关联 15/01/11 18:14:20 INFO ha.ActiveStandbyElector: Successfully created /hadoop-ha/cluster1 in ZK. 6.2) 启动journalnode集群 在h2master、h2master2、h2sliver112上分别执行hadoop/sbin/hadoop-daemon.sh start journalnode 6.3) 格式化namenode、启动namenode 在h2master上执行bin/hdfs namenode -format 在h2master上执行sbin/hadoop-daemon.sh start namenode 在h2master2上执行bin/hdfs namenode -bootstrapStandby 在h2master2上执行sbin/hadoop-daemon.sh start namenode 6.4) 启动datanode 在h2master上执行hadoop/sbin/hadoop-daemons.sh start datanode 启动所有datanode节点 此时访问如下链接 http://h2master:50070/ http://h2master2:50070/ 两个namenode都是standby状态 6.5) 启动ZKFC (FailoverController) 必须是在namenode节点上启动 让zk来决定用哪个namenode作为active 在h2master、h2master上 启动zkfc,执行命令sbin/hadoop-daemon.sh start zkfc 此时访问 http://h2master:50070/ http://h2master2:50070/ 结果如下: Overview 'h2master:9000' (active) Overview 'h2master2:9000' (standby)
7) 验证自动切换:
关闭h2master的namenode进程:
再次刷新
http://h2master:50070/ http://h2master2:50070/ 结果如下:
Overview 'h2master2:9000' (active) ----> zk自动切换节点h2master2为active状态
h2master无法连接
8) 总结:自动切换比手工切换多出来的步骤
(1)配置上core-site.xml增加了配置项ha.zookeeper.quorum(zk集群的配置);
hdfs-site.xml中把dfs.ha.automatic-failover.enabled.cluster1改为true
(2)操作上格式化zk,执行命令bin/hdfs zkfc -formatZK;
启动zkfc,执行命令sbin/hadoop-daemon.sh start zkfc
如下操作不需要关闭所有服务,仅需要对yarn涉及到的服务进行关闭开启即可
9) 搭建yarn集群: 仅需要配置Resourcemanager即可
为了能够运行MapReduce程序,需要让各个NodeManager在启动时加载shuffle server,shuffle server实际上是Jetty/Netty Server,Reduce Task通过该server从各个NodeManager上远程拷贝Map Task产生的中间结果。 这就是配置
yarn.nodemanager.aux-services的作用
集群各节点上: 修改配置文件 yarn-site.xml <property> <name>yarn.resourcemanager.hostname</name> ---- 指定resourcemanager是哪个节点 <value>h2master</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> // 开启日志归集 这样在8088节点上能看到日志信息 <name>yarn.log-aggregation-enable</name> <value>true</value> </property> 修改配置文件 mapred-site.xml <property> <name>mapreduce.framework.name</name> --- 指定mapreduce 运行形式使用yarn <value>yarn</value> </property> 启动在h2master上执行 hadoop/sbin/start-yarn.sh 后访问: http://h2master:8088/cluster
10 yarn上启动historyserver(目的,在页面上可以看任务执行的所有节点日志信息汇总)
1.在mapred-site中配置 <property> <name>mapreduce.jobhistory.address</name> <value>h2master:10020</value> <description>MapReduce JobHistory Server host:port. Default port is 10020.</description> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> ---指定historyserver在web端访问地址 <value>h2master:19888</value> <description>MapReduce JobHistory Server Web UI host:port. Default port is 19888.</description> </property> <property> <name>mapreduce.jobhistory.intermediate-done-dir</name> ---指定historyserver收集各节点日志后存放的数据路径 <value>/usr/local/hadoop2.5/tmp/mr_history</value> <description>Directory where history files are written by MapReduce jobs.</description> </property> <property> <name>mapreduce.jobhistory.done-dir</name> <value>/usr/local/hadoop2.5/tmp/mr_history</value> <description>Directory where history files are managed by the MR JobHistory Server.</description> </property> 2.在yarn-site.xml中配置 <property> <name>yarn.log-aggregation-enable</name> ---启动historyserver收集各节点日志功能这样才能在web端的访问地址上统一查看日志 <value>true</value> </property> 3.配置文件复制到集群的其他节点 4.重新启动yarn平台 h2master上 sbin/stop-yarn.sh sbin/start-yarn.sh 最后执行sbin/mr-jobhistory-daemon.sh start historyserver