分布式 key-value 存储系统 etcd 的安装备忘

ceibake

2019-06-27

关注关注

由于 etcd 的安装、启动等过程与官方文档所说的有些不同，这里备忘以免重复采坑。

Github 地址：https://github.com/coreos/etcd

这里介绍集群的启动方式，假设我们有两台机器：

机器一：192.168.33.10
机器二：192.168.33.11

注：两台机器无需建立 SSH 互信

1. 下载

从 Github release 页面下载最新版本：Releases · coreos/etcd · GitHub

这里下载当前（ 2018 年 07 月 01 日）最新版本，Linux amd64：https://github.com/coreos/etc...

解压并进入目录：

tar -zvxf etcd-v3.3.8-linux-amd64.tar.gz

cd etcd-v3.3.8-linux-amd64

注：集群中的每一台机器都需进行该操作。

2. 启动

官方文档：Install etcd | Get Started with etcd | CoreOS

根据官网文档的介绍，我们可以使用参数或配置文件启动，以下分别介绍这两种方式。

2.1 通过参数启动

注：以下使用 nohup，使其以后台方式运行

2.1.1 在 192.168.33.10 中启动

nohup ./etcd --name my-etcd-1 \
--listen-client-urls http://192.168.33.10:2379 \
--advertise-client-urls http://192.168.33.10:2379 \
--listen-peer-urls http://192.168.33.10:2380 \
--initial-advertise-peer-urls http://192.168.33.10:2380 \
--initial-cluster my-etcd-1=http://192.168.33.10:2380,my-etcd-2=http://192.168.33.11:2380 \
--initial-cluster-token my-etcd-token \
--initial-cluster-state new \
>/dev/null 2>&1 &

2.1.2 在 192.168.33.11 中启动

nohup ./etcd --name my-etcd-2 \
--listen-client-urls http://192.168.33.11:2379 \
--advertise-client-urls http://192.168.33.11:2379 \
--listen-peer-urls http://192.168.33.11:2380 \
--initial-advertise-peer-urls http://192.168.33.11:2380 \
--initial-cluster my-etcd-1=http://192.168.33.10:2380,my-etcd-2=http://192.168.33.11:2380 \
--initial-cluster-token my-etcd-token \
--initial-cluster-state new \
>/dev/null 2>&1 &

2.2 通过配置文件启动

配置文件参考：Install etcd | Get Started with etcd | CoreOS 及 etcd/etcd.conf.yml.sample at master · coreos/etcd · GitHub

我们将配置文件命名为：etcd.conf.yml，并将其置于与 etcd-v3.3.8-linux-amd64 同级目录下。

2.2.1 在 192.168.33.10 中添加配置文件并启动

vim ./etcd.conf.yml

内容为：

name:                        my-etcd-1
listen-client-urls:          http://192.168.33.10:2379
advertise-client-urls:       http://192.168.33.10:2379
listen-peer-urls:            http://192.168.33.10:2380
initial-advertise-peer-urls: http://192.168.33.10:2380
initial-cluster:             my-etcd-1=http://192.168.33.10:2380,my-etcd-2=http://192.168.33.11:2380
initial-cluster-token:       my-etcd-token
initial-cluster-state:       new

然后执行：

nohup ./etcd --config-file ./etcd.conf.yml >/dev/null 2>&1 &

2.2.2 在 192.168.33.11 中添加配置文件并启动

vim ./etcd.conf.yml

name:                        my-etcd-2
listen-client-urls:          http://192.168.33.11:2379
advertise-client-urls:       http://192.168.33.11:2379
listen-peer-urls:            http://192.168.33.11:2380
initial-advertise-peer-urls: http://192.168.33.11:2380
initial-cluster:             my-etcd-1=http://192.168.33.10:2380,my-etcd-2=http://192.168.33.11:2380
initial-cluster-token:       my-etcd-token
initial-cluster-state:       new

然后执行：

nohup ./etcd --config-file ./etcd.conf.yml >/dev/null 2>&1 &

3. 检测启动

按照官方文档所说，我们可以在集群的任意一台节点上，通过执行如下指令检测集群的运行情况：

# 仍然在 etcd-v3.3.8-linux-amd64 目录下

./etcdctl cluster-health

然而，当执行这句指令后，我们会得到如下信息：

[vagrant@192-168-33-10 etcd-v3.3.8-linux-amd64]$ ./etcdctl cluster-health
cluster may be unhealthy: failed to list members
Error:  client: etcd cluster is unavailable or misconfigured; error #0: dial tcp 127.0.0.1:2379: getsockopt: connection refused
; error #1: dial tcp 127.0.0.1:4001: getsockopt: connection refused

error #0: dial tcp 127.0.0.1:2379: getsockopt: connection refused
error #1: dial tcp 127.0.0.1:4001: getsockopt: connection refused

这个报错很奇怪，启动时明明指定的不是 127.0.0.1。

查阅了一些资料，最终在 Github issue 中找到了问题所在：

I faced the same situation. After searching, I find etcdctl cmd use default endpoints "http://127.0.0.1:2379,http://127.0.0.1:4001". If etcd start with --listen-client-urls http://HOST_IP:2379, then you can use etcdctl like etcdctl --endpoints 'http://HOST_IP:2379' member list

这个 issue 中提供了多种方式，个人感觉最好的解决方式是通过追加 --endopoints 参数：

./etcdctl --endpoints http://192.168.33.10:2379,http://192.168.33.11:2379 cluster-health

当然，--endpoints 参数的值可以只添加其中一台机器，如：

./etcdctl --endpoints http://192.168.33.10:2379 cluster-health

细节可以参考：Etcd2 cluster working but etcdctl broken · Issue #1028 · coreos/bugs · GitHub

这样我们就能发现，etcd 集群已经成功启动了：

[vagrant@192-168-33-10 etcd-v3.3.8-linux-amd64]$ ./etcdctl --endpoints http://192.168.33.10:2379 cluster-health
member 42ab269b4f75b118 is healthy: got healthy result from http://192.168.33.11:2379
member 7118e8ab00eced36 is healthy: got healthy result from http://192.168.33.10:2379
cluster is healthy

当然，我们也可以添加 member list 指令查看：

[vagrant@192-168-33-11 etcd-v3.3.8-linux-amd64]$ ./etcdctl --endpoints http://192.168.33.10:2379 member list
42ab269b4f75b118: name=my-etcd-2 peerURLs=http://192.168.33.11:2380 clientURLs=http://192.168.33.11:2379 isLeader=true
7118e8ab00eced36: name=my-etcd-1 peerURLs=http://192.168.33.10:2380 clientURLs=http://192.168.33.10:2379 isLeader=false