Nomad 服务编排
Nomad 服务编排
Nomad 是一个管理机器集群并在集群上运行应用程序的工具。
快速入门
环境准备
参考之前的一篇《Consul 搭建集群》准备三台虚机。
ip | |
---|---|
n1 | 172.20.20.10 |
n2 | 172.20.20.11 |
n3 | 172.20.20.12 |
单机安装
登录到虚机n1,切换用户到root
» vagrant ssh n1 su [vagrant@n1 ~]$ su Password: [root@n1 vagrant]#
安装一些依赖的工具
[root@n1 vagrant]# yum install -y epel-release [root@n1 vagrant]# yum install -y jq [root@n1 vagrant]# yum install -y unzip
下载0.8.1版本到/tmp目录下
最新的0.8.3版本和consul结合会有反复注册服务的bug,这里使用0.8.1
[root@n1 vagrant]# cd /tmp/ [root@n1 vagrant]# curl -s https://releases.hashicorp.com/nomad/0.8.1/nomad_0.8.1_linux_amd64.zip -o nomad.zip
解压,并赋予nomad可执行权限,最后把nomad移动到/usr/bin/下
[root@n1 vagrant]# unzip nomad.zip [root@n1 vagrant]# chmod +x nomad [root@n1 vagrant]# mv nomad /usr/bin/nomad
检查nomad是否安装成功
[root@n1 vagrant]# nomad Usage: nomad [-version] [-help] [-autocomplete-(un)install] <command> [args] Common commands: run Run a new job or update an existing job stop Stop a running job status Display the status output for a resource alloc Interact with allocations job Interact with jobs node Interact with nodes agent Runs a Nomad agent Other commands: acl Interact with ACL policies and tokens agent-info Display status information about the local agent deployment Interact with deployments eval Interact with evaluations namespace Interact with namespaces operator Provides cluster-level tools for Nomad operators quota Interact with quotas sentinel Interact with Sentinel policies server Interact with servers ui Open the Nomad Web UI version Prints the Nomad version
出现如上所示代表安装成功。
批量安装
参考之前的一篇《Consul 搭建集群》批量安装这一节。
使用如下脚本可批量安装nomad,并同时为每个虚机安装好docker。
$script = <<SCRIPT echo "Installing dependencies ..." yum install -y epel-release yum install -y net-tools yum install -y wget yum install -y jq yum install -y unzip yum install -y bind-utils echo "Determining Consul version to install ..." CHECKPOINT_URL="https://checkpoint-api.hashicorp.com/v1/check" if [ -z "$CONSUL_DEMO_VERSION" ]; then CONSUL_DEMO_VERSION=$(curl -s "${CHECKPOINT_URL}"/consul | jq .current_version | tr -d '"') fi echo "Fetching Consul version ${CONSUL_DEMO_VERSION} ..." cd /tmp/ curl -s https://releases.hashicorp.com/consul/${CONSUL_DEMO_VERSION}/consul_${CONSUL_DEMO_VERSION}_linux_amd64.zip -o consul.zip echo "Installing Consul version ${CONSUL_DEMO_VERSION} ..." unzip consul.zip sudo chmod +x consul sudo mv consul /usr/bin/consul sudo mkdir /etc/consul.d sudo chmod a+w /etc/consul.d echo "Determining Nomad 0.8.1 to install ..." #CHECKPOINT_URL="https://checkpoint-api.hashicorp.com/v1/check" #if [ -z "$NOMAD_DEMO_VERSION" ]; then # NOMAD_DEMO_VERSION=$(curl -s "${CHECKPOINT_URL}"/nomad | jq .current_version | tr -d '"') #fi echo "Fetching Nomad version ${NOMAD_DEMO_VERSION} ..." cd /tmp/ curl -s https://releases.hashicorp.com/nomad/0.8.1/nomad_0.8.1_linux_amd64.zip -o nomad.zip echo "Installing Nomad version 0.8.1 ..." unzip nomad.zip sudo chmod +x nomad sudo mv nomad /usr/bin/nomad echo "Installing nginx ..." #yum install -y nginx echo "Installing docker ..." yum install -y docker SCRIPT
启动 Agent
首先启动consul组成一个集群,具体参考《Consul 搭建集群》。如果用默认的配置,nomad启动后会检测本机的Consul并自动的讲nomad服务注册。
n1
[root@n1 vagrant]# consul agent -server -bootstrap-expect 3 -data-dir /etc/consul.d -node=node1 -bind=172.20.20.10 -ui -client 0.0.0.0
n2
[root@n2 vagrant]# consul agent -server -bootstrap-expect 3 -data-dir /etc/consul.d -node=node2 -bind=172.20.20.11 -ui -client 0.0.0.0 -join 172.20.20.10
n3
[root@n3 vagrant]# consul agent -server -bootstrap-expect 3 -data-dir /etc/consul.d -node=node3 -bind=172.20.20.12 -ui -client 0.0.0.0 -join 172.20.20.10
[root@n1 vagrant]# consul members Node Address Status Type Build Protocol DC Segment node1 172.20.20.10:8301 alive server 1.1.0 2 dc1 <all> node2 172.20.20.11:8301 alive server 1.1.0 2 dc1 <all> node3 172.20.20.12:8301 alive server 1.1.0 2 dc1 <all>
基本概念
- server 分配提交的job
- clinet 执行job任务
启动server
定义server的配置文件server.hcl
log_level = "DEBUG" bind_addr = "0.0.0.0" data_dir = "/home/vagrant/data_server" name = "server1" advertise { http = "172.20.20.10:4646" rpc = "172.20.20.10:4647" serf = "172.20.20.10:4648" } server { enabled = true # Self-elect, should be 3 or 5 for production bootstrap_expect = 3 }
在命令行中执行
[root@n1 vagrant]# nomad agent -config=server.hcl
进入到n2,n3 执行
nomad agent -config=server.hcl
打开浏览器 http://172.20.20.10:8500/ui/#/dc1/services
从consul中能看到nomad都以启动
再打开nomad自带的UI http://172.20.20.10:4646/ui/servers
可以看到server都已运行
启动client
在启动client之前需要先启动docker
,client执行job需要用到docker。
[root@n1 vagrant]# systemctl start docker
在n2,n3 也需要启动
定义client的配置文件client.hcl
log_level = "DEBUG" data_dir = "/home/vagrant/data_clinet" name = "client1" advertise { http = "172.20.20.10:4646" rpc = "172.20.20.10:4647" serf = "172.20.20.10:4648" } client { enabled = true servers = ["172.20.20.10:4647"] } ports { http = 5656 }
在n1中输入命令
[root@n1 vagrant]# nomad agent -config=client.hcl
打开浏览器 http://172.20.20.10:8500/ui/#/dc1/services/nomad-client
可以看到nomad-client已经启动成功,同理在n2,n3也运行client。
最终显示如下
运行 Job
进入到n2,新建一个文件夹job,运行nomad init
[root@n2 vagrant]# mkdir job [root@n2 vagrant]# cd job/ [root@n2 job]# nomad init Example job file written to example.nomad
以上命令新建了一个example的Job
命令行键入
[root@n2 job]# nomad run example.nomad ==> Monitoring evaluation "97f8a1fe" Evaluation triggered by job "example" Evaluation within deployment: "3c89e74a" Allocation "47bf1f20" created: node "9df69026", group "cache" Evaluation status changed: "pending" -> "complete" ==> Evaluation "97f8a1fe" finished with status "complete"
可以看到节点为9df69026
的client去执行了Job
进阶操作
集群成员
[root@n1 vagrant]# nomad server members Name Address Port Status Leader Protocol Build Datacenter Region server1.global 172.20.20.10 4648 alive false 2 0.8.1 dc1 global server2.global 172.20.20.11 4648 alive false 2 0.8.1 dc1 global server3.global 172.20.20.12 4648 alive true 2 0.8.1 dc1 global
查询 Job 状态
[root@n1 vagrant]# nomad status example ID = example Name = example Submit Date = 2018-06-13T08:42:57Z Type = service Priority = 50 Datacenters = dc1 Status = running Periodic = false Parameterized = false Summary Task Group Queued Starting Running Failed Complete Lost cache 0 0 1 0 0 0 Latest Deployment ID = 3c89e74a Status = successful Description = Deployment completed successfully Deployed Task Group Desired Placed Healthy Unhealthy cache 1 1 1 0 Allocations ID Node ID Task Group Version Desired Status Created Modified 47bf1f20 9df69026 cache 0 run running 8m44s ago 8m26s ago
修改 Job
编辑 example.nomad 找到 count = 1
修改为 count = 3
在命令行中查看Job的变更计划
[root@n2 job]# nomad plan example.nomad +/- Job: "example" +/- Task Group: "cache" (2 create, 1 in-place update) +/- Count: "1" => "3" (forces create) Task: "redis" Scheduler dry-run: - All tasks successfully allocated. Job Modify Index: 70 To submit the job with version verification run: nomad job run -check-index 70 example.nomad When running the job with the check-index flag, the job will only be run if the server side version matches the job modify index returned. If the index has changed, another user has modified the job and the plan's results are potentially invalid.
执行Job的变更任务
[root@n2 job]# nomad job run -check-index 70 example.nomad ==> Monitoring evaluation "3a0ff5e0" Evaluation triggered by job "example" Evaluation within deployment: "2b5b803f" Allocation "34086acb" created: node "6166e031", group "cache" Allocation "4d01cd92" created: node "f97b5095", group "cache" Allocation "47bf1f20" modified: node "9df69026", group "cache" Evaluation status changed: "pending" -> "complete" ==> Evaluation "3a0ff5e0" finished with status "complete"
可以看到又多了两个client节点去执行Job任务
在浏览器中可以看到一共有3个实例
同时也能看到Job的版本记录
[root@n2 job]# nomad status example ID = example Name = example Submit Date = 2018-06-13T08:56:03Z Type = service Priority = 50 Datacenters = dc1 Status = running Periodic = false Parameterized = false Summary Task Group Queued Starting Running Failed Complete Lost cache 0 0 3 0 0 0 Latest Deployment ID = 2b5b803f Status = successful Description = Deployment completed successfully Deployed Task Group Desired Placed Healthy Unhealthy cache 3 3 3 0 Allocations ID Node ID Task Group Version Desired Status Created Modified 34086acb 6166e031 cache 1 run running 3m38s ago 3m25s ago 4d01cd92 f97b5095 cache 1 run running 3m38s ago 3m26s ago 47bf1f20 9df69026 cache 1 run running 16m43s ago 3m27s ago
离开集群
首先停止n1的nomad server,Ctrl-C
在n2上查询members
[root@n2 job]# nomad server members Name Address Port Status Leader Protocol Build Datacenter Region server1.global 172.20.20.10 4648 failed false 2 0.8.1 dc1 global server2.global 172.20.20.11 4648 alive true 2 0.8.1 dc1 global server3.global 172.20.20.12 4648 alive false 2 0.8.1 dc1 global
server1 的状态为 failed,此时将server1 移出集群
[root@n2 job]# nomad server force-leave server1.global [root@n2 job]# nomad server members Name Address Port Status Leader Protocol Build Datacenter Region server1.global 172.20.20.10 4648 left false 2 0.8.1 dc1 global server2.global 172.20.20.11 4648 alive true 2 0.8.1 dc1 global server3.global 172.20.20.12 4648 alive false 2 0.8.1 dc1 global
server1的状态为left,移出集群成功。