Docker创建的集群下使用ansible部署hadoop
Docker创建的集群下使用ansible部署hadoop
基础环境
MBP, Palallels Desktop, Centos7
关键词
docker, ansible, hadoop
集群架构
集群包含4台“虚拟主机”,采用Docker创建容器的方式创建,无需创建多台虚拟机,简单方便。
OS | hostname | IP |
---|---|---|
Centos7 | cluster-master | 172.18.0.2 |
Centos7 | cluster-slave1 | 172.18.0.3 |
Centos7 | cluster-slave1 | 172.18.0.4 |
Centos7 | cluster-slave1 | 172.18.0.5 |
Docker
docker 安装
登录到Centos 7寄主机中安装docker
[root@centos-linux ~]# yum -y install docker.x86_64
启动docker服务
[root@centos-linux ~]# systemctl start docker
Centos镜像拉取
国内使用docker.io拉取镜像的时候非常慢,所以找了镜像仓库进行拉取。
本次用到的镜像仓库是http://hub.daocloud.io, 在搜索栏中输入centos并搜索
在检索到的镜像详情右侧给出了pull的命令,直接运行即可。
[root@centos-linux ~]# docker pull daocloud.io/library/centos:latest
拉取完成之后可以看到
[root@centos-linux ~]# docker images REPOSITORY TAG IMAGE ID CREATED SIZE daocloud.io/library/centos latest 328edcd84f1b 3 weeks ago 192.5 MB
创建容器
按照集群的架构,创建容器时需要设置固定IP,所以先要在docker下创建固定IP的网络组
[root@centos-linux ~]# docker network create --subnet=172.18.0.0/16 netgroup
docker的网络组创建完成之后就可以创建固定IP的容器了
[root@centos-linux ~]# docker run -d --privileged -ti -v /sys/fs/cgroup:/sys/fs/cgroup --name cluster-master -h cluster-master --net netgroup --ip 172.18.0.2 daocloud.io/library/centos /usr/sbin/init [root@centos-linux ~]# docker run -d --privileged -ti -v /sys/fs/cgroup:/sys/fs/cgroup --name cluster-slave1 -h cluster-slave1 --net netgroup --ip 172.18.0.3 daocloud.io/library/centos /usr/sbin/init [root@centos-linux ~]# docker run -d --privileged -ti -v /sys/fs/cgroup:/sys/fs/cgroup --name cluster-slave2 -h cluster-slave2 --net netgroup --ip 172.18.0.4 daocloud.io/library/centos /usr/sbin/init [root@centos-linux ~]# docker run -d --privileged -ti -v /sys/fs/cgroup:/sys/fs/cgroup --name cluster-slave3 -h cluster-slave3 --net netgroup --ip 172.18.0.5 daocloud.io/library/centos /usr/sbin/init
在Centos7下使用简单方式创建容器后遇到sshd启动失败的问题,所以需要添加参数--privileged和-v /sys/fs/cgroup:/sys/fs/cgroup,并在启动的时候运行/usr/sbin/init。
openssh
创建的centos容器中并没有预装ssh服务,需要登录到所有容器中手动安装。
登录容器
[root@centos-linux ~]# docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 328913df0d52 daocloud.io/library/centos "/usr/sbin/init" 1 hours ago Up 1 hours cluster-slave3 83361d4bf079 daocloud.io/library/centos "/usr/sbin/init" 1 hours ago Up 1 hours cluster-slave2 35d1e5732340 daocloud.io/library/centos "/usr/sbin/init" 1 hours ago Up 1 hours cluster-slave1 37fbcc24b0e3 daocloud.io/library/centos "/usr/sbin/init" 1 hours ago Up 1 hours cluster-master
[root@centos-linux ~]# docker exec -it 37fbcc24b0e3 /bin/bash
安装openssh
[root@cluster-master /]# yum -y install openssh openssh-server openssh-clients
启动ssh服务
[root@cluster-master /]# systemctl start sshd
ssh自动接受新的公钥
master设置ssh登录自动添加kown_hosts vi编辑/etc/ssh/ssh_config配置文件 设置StrictHostKeyChecking为no
配置cluster-master的hosts
/etc/hosts文件在容器启动时被重写,直接修改内容在容器重启后不能保留,为了让容器在重启之后获取集群hosts,使用了一种启动容器后重写hosts的方法。
需要在~/.bashrc中追加以下指令
:>/etc/hosts cat >>/etc/hosts<<EOF 127.0.0.1 localhost 172.18.0.2 cluster-master 172.18.0.3 cluster-slave1 172.18.0.4 cluster-slave2 172.18.0.5 cluster-slave3 EOF
运行source ~/.bashrc使之生效
这时可以看到/etc/hosts文件已经被改为需要的内容
[root@cluster-master /]# cat /etc/hosts 127.0.0.1 localhost 172.18.0.2 cluster-master 172.18.0.3 cluster-slave1 172.18.0.4 cluster-slave2 172.18.0.5 cluster-slave3
cluster-master公钥分发
在master机上执行ssh-keygen -t rsa并一路回车,完成之后会生成~/.ssh目录,目录下有id_rsa(私钥文件)和id_rsa.pub(公钥文件),再将id_rsa.pub重定向到文件authorized_keys
[root@cluster-master /]# cd ~/.ssh [root@cluster-master /]# cat id_rsa.pub>authorized_keys
文件生成之后用scp将公钥文件分发到集群slave主机
[root@cluster-master /]# ssh root@cluster-slave1 'mkdir ~/.ssh' [root@cluster-master /]# scp ~/.ssh/authorized_keys root@cluster-slave1:~/.ssh [root@cluster-master /]# ssh root@cluster-slave2 'mkdir ~/.ssh' [root@cluster-master /]# scp ~/.ssh/authorized_keys root@cluster-slave2:~/.ssh [root@cluster-master /]# ssh root@cluster-slave3 'mkdir ~/.ssh' [root@cluster-master /]# scp ~/.ssh/authorized_keys root@cluster-slave3:~/.ssh
分发完成之后测试是否已经可以免输入密码登录。另外本次实验使用到了root用户,如果在其他用户下使用免密码登录,需要确保用户对~/.ssh/authorized_keys文件有可操作权限。
Ansible
源码安装ansible
使用git将文件安装在cluster-master的/opt目录下,因为无需agent,所以安装在一台主控机上即可管理集群。容器中的centos没有预装git,在安装之前需要事先安装git:
[root@cluster-master /]# yum -y install git
在/opt下执行
[root@cluster-master opt]# git clone git://github.com/ansible/ansible.git --recursive
在执行ansible的env-setup之前需要安装一些容器中没有预装的软件包
[root@cluster-master opt]# yum -y install python-setuptools [root@cluster-master opt]# easy_install pip [root@cluster-master opt]# pip install paramiko PyYAML Jinja2 httplib2 six
进入ansible目录,执行安装脚本
[root@cluster-master opt]# cd ./ansible [root@cluster-master ansible]# source ./hacking/env-setup
配置ansible hosts
创建/etc/ansible/hosts文件,并将cluster内的主机按组的方式写入该文件
[cluster] cluster-master cluster-slave1 cluster-slave2 cluster-slave3 [master] cluster-master [slaves] cluster-slave1 cluster-slave2 cluster-slave3
Hadoop
在集群中安装openjdk
使用ansible在在集群中安装openjdk
[root@cluster-master ansible]# ansible cluster -m yum -a "name=java-1.8.0-openjdk,java-1.8.0-openjdk-devel state=latest"
在cluster-master上安装hadoop
将hadoop安装包下载至/opt目录下
[root@cluster-master opt]# wget http://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/hadoop-2.7.4/hadoop-2.7.4.tar.gz
下载完成之后解压安装包,并创建链接文件
[root@cluster-master opt]# tar -xzvf hadoop-2.7.4.tar.gz [root@cluster-master opt]# ln -s hadoop-2.7.4 hadoop
设置java和hadoop环境变量(.bashrc)
# java export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.141-1.b16.el7_3.x86_64 export PATH=$HADOOP_HOME/bin:$PATH # hadoop export HADOOP_HOME=/opt/hadoop export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
修改slaves文件
[root@cluster-master opt]# vi /opt/hadoop/etc/hadoop/slaves cluster-slave1 cluster-slave2 cluster-slave3
修改hadoop运行所需配置文件
core-site.xml
<configuration> <property> <name>hadoop.tmp.dir</name> <value>/home/hadoop/tmp</value> <description>A base for other temporary directories.</description> </property> <!-- file system properties --> <property> <name>fs.default.name</name> <value>hdfs://cluster-master:9000</value> </property> </configuration>
hdfs-site.xml
<configuration> <property> <name>dfs.replication</name> <value>3</value> </property> <configuration>
mapred-site.xml
<configuration> <property> <name>mapred.job.tracker</name> <value>http://cluster-master:9001</value> </property> </configuration>
yarn-site.xml
<configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.resourcemanager.address</name> <value>cluster-master:18040</value> </property> </configuration>
打包hadoop文件
将hadoop链接文件和hadoop-2.7.4打包成一个文件方便ansible分发到slave主机
[root@cluster-master opt]# tar -cvf hadoop-dis.tar hadoop hadoop-2.7.4
使用ansible-playbook分发.bashrc和hadoop-dis.tar至slave主机
--- - hosts: cluster tasks: - name: copy .bashrc to slaves copy: src=~/.bashrc dest=~/ notify: - exec source - name: mkdir /home/hadoop/tmp shell: mkdir -p /home/hadoop/tmp - name: copy hadoop-dis.tar to slaves unarchive: src=/opt/hadoop-dis.tar dest=/opt handlers: - name: exec source shell: source ~/.bashrc
将以上yaml保存为hadoop-dis.yaml,并执行
[root@cluster-master opt]# ansible-playbook hadoop-dis.yaml
hadoop-dis.tar会自动解压到slave主机的/opt目录下。
格式化namenode
[root@cluster-master opt]# hadoop namenode -format
启动hadoop集群
到这一步已经可以开始hadoop之旅了,启动比较简单,在$HADOOP_HOME/sbin下有几个启动和停止的脚本如下:
[root@cluster-master opt]# cd $HADOOP_HOME/sbin [root@cluster-master sbin]# ls -l total 120 -rwxr-xr-x. 1 20415 101 2752 Aug 1 00:35 distribute-exclude.sh -rwxr-xr-x. 1 20415 101 6452 Aug 1 00:35 hadoop-daemon.sh -rwxr-xr-x. 1 20415 101 1360 Aug 1 00:35 hadoop-daemons.sh -rwxr-xr-x. 1 20415 101 1640 Aug 1 00:35 hdfs-config.cmd -rwxr-xr-x. 1 20415 101 1427 Aug 1 00:35 hdfs-config.sh -rwxr-xr-x. 1 20415 101 2291 Aug 1 00:35 httpfs.sh -rwxr-xr-x. 1 20415 101 3128 Aug 1 00:35 kms.sh -rwxr-xr-x. 1 20415 101 4080 Aug 1 00:35 mr-jobhistory-daemon.sh -rwxr-xr-x. 1 20415 101 1648 Aug 1 00:35 refresh-namenodes.sh -rwxr-xr-x. 1 20415 101 2145 Aug 1 00:35 slaves.sh -rwxr-xr-x. 1 20415 101 1779 Aug 1 00:35 start-all.cmd -rwxr-xr-x. 1 20415 101 1471 Aug 1 00:35 start-all.sh -rwxr-xr-x. 1 20415 101 1128 Aug 1 00:35 start-balancer.sh -rwxr-xr-x. 1 20415 101 1401 Aug 1 00:35 start-dfs.cmd -rwxr-xr-x. 1 20415 101 3734 Aug 1 00:35 start-dfs.sh -rwxr-xr-x. 1 20415 101 1357 Aug 1 00:35 start-secure-dns.sh -rwxr-xr-x. 1 20415 101 1571 Aug 1 00:35 start-yarn.cmd -rwxr-xr-x. 1 20415 101 1347 Aug 1 00:35 start-yarn.sh -rwxr-xr-x. 1 20415 101 1770 Aug 1 00:35 stop-all.cmd -rwxr-xr-x. 1 20415 101 1462 Aug 1 00:35 stop-all.sh -rwxr-xr-x. 1 20415 101 1179 Aug 1 00:35 stop-balancer.sh -rwxr-xr-x. 1 20415 101 1455 Aug 1 00:35 stop-dfs.cmd -rwxr-xr-x. 1 20415 101 3206 Aug 1 00:35 stop-dfs.sh -rwxr-xr-x. 1 20415 101 1340 Aug 1 00:35 stop-secure-dns.sh -rwxr-xr-x. 1 20415 101 1642 Aug 1 00:35 stop-yarn.cmd -rwxr-xr-x. 1 20415 101 1340 Aug 1 00:35 stop-yarn.sh -rwxr-xr-x. 1 20415 101 4295 Aug 1 00:35 yarn-daemon.sh -rwxr-xr-x. 1 20415 101 1353 Aug 1 00:35 yarn-daemons.sh
主要使用HDFS,启动start-dfs.sh即可
[root@cluster-master sbin]# ./start-dfs.sh Starting namenodes on [cluster-master] cluster-master: starting namenode, logging to /opt/hadoop-2.7.4/logs/hadoop-root-namenode-cluster-master.out cluster-slave1: starting datanode, logging to /opt/hadoop-2.7.4/logs/hadoop-root-datanode-cluster-slave1.out cluster-slave3: starting datanode, logging to /opt/hadoop-2.7.4/logs/hadoop-root-datanode-cluster-slave3.out cluster-slave2: starting datanode, logging to /opt/hadoop-2.7.4/logs/hadoop-root-datanode-cluster-slave2.out Starting secondary namenodes [0.0.0.0] 0.0.0.0: starting secondarynamenode, logging to /opt/hadoop-2.7.4/logs/hadoop-root-secondarynamenode-cluster-master.out
启动完成之后会在master和slave看到以下进程
cluster-master
[root@cluster-master sbin]# jps 2484 SecondaryNameNode 2648 Jps 2315 NameNode
cluster-slave
[root@cluster-slave1 logs]# jps 27502 DataNode 27583 Jps
体验hadoop
在master端上传文件试试
[root@cluster-master sbin]# hadoop dfs -put start-dfs.sh / DEPRECATED: Use of this script to execute hdfs command is deprecated. Instead use the hdfs command for it. [root@cluster-master sbin]# hadoop dfs -ls / DEPRECATED: Use of this script to execute hdfs command is deprecated. Instead use the hdfs command for it. Found 1 items -rw-r--r-- 3 root supergroup 3734 2017-08-26 10:57 /start-dfs.sh
总结
docker下部署hadoop确实比较奇葩,但是对于个人学习来说确是十分方便,不用在虚拟机里复制多套环境,也简化了网络配置的步骤,加上ansible的使用使得集群的管理更加高效。