calico 2.6.1 升级至 3.11

微微一笑

2019-12-26

关注关注

说明

查看官方文档升级的操作需要做如下注意事项。

2.6.x 与 3.x 使用的etcd(这里只是针对 etcd 存储来说) 是不同的，2.6 的使用的是 etcdv2, 而3.x 是 etcdv3.
如果想从 2.6.x 升级到 3.x 至少得是2.6.5+的才行。

所以针对现有的情况，需要先升级至 2.6.5+ ，再升级 3.x。

2.6.1 升级至 2.6.12

2019/12/25

现有环境，使用 etcdv2 进行存储的 calico 数据。

[ kubelet]# which etcdv2
alias etcdv2=‘export ETCDCTL_API=2; /bin/etcdctl --ca-file /etc/etcd/ssl/etcd-root-ca.pem --cert-file /etc/etcd/ssl/etcd.pem --key-file /etc/etcd/ssl/etcd-key.pem --endpoints https://10.111.32.239:2379,https://10.111.32.241:2379,https://10.111.32.242:2379‘

[ kubelet]# etcdv2 ls /calico/ipam/v2/assignment/ipv4
/calico/ipam/v2/assignment/ipv4/block
[ kubelet]# etcdv2 ls /calico/ipam/v2/assignment/ipv4/block
/calico/ipam/v2/assignment/ipv4/block/10.20.134.64-26
/calico/ipam/v2/assignment/ipv4/block/10.20.253.64-26
/calico/ipam/v2/assignment/ipv4/block/10.20.28.192-26
/calico/ipam/v2/assignment/ipv4/block/10.20.51.128-26
/calico/ipam/v2/assignment/ipv4/block/10.20.78.0-26
/calico/ipam/v2/assignment/ipv4/block/10.20.112.64-26
/calico/ipam/v2/assignment/ipv4/block/10.20.15.128-26
/calico/ipam/v2/assignment/ipv4/block/10.20.235.0-26
/calico/ipam/v2/assignment/ipv4/block/10.20.53.64-26
/calico/ipam/v2/assignment/ipv4/block/10.20.72.128-26

根据文档中的说明，升级至 3.0 需要至少 2.6.5+ ,且需要进行一些手动的操作，因为 3.x 的使用 etcdv3, 而 2.6.x 的使用 etcdv2。

现在集群使用的是 2.6.1 的版本，先将其升级至 2.6.5+。

这里选择 2.6 中最新的 2.6.12

下载 calico.yaml 文件

[ v2.6]# wget https://docs.projectcalico.org/v2.6/getting-started/kubernetes/installation/rbac.yaml

[ v2.6]# wget https://docs.projectcalico.org/v2.6/getting-started/kubernetes/installation/hosted/calico.yaml

# 更改 calico.yaml 中的配置
[ v2.6]# sh -x modify_calico_yaml.sh

预先拉取镜像

[ v2.6]# grep image calico.yaml
          image: quay.io/calico/node:v2.6.12
          image: quay.io/calico/cni:v1.11.8
          image: quay.io/calico/kube-controllers:v1.0.5
          image: quay.io/calico/kube-controllers:v1.0.5

文档中说的一些升级步骤，比如先升级 calico-kube-controllers ,再升级 calico-node 的daemonset ,这里就直接 apply 新的资源文件

并不包含 calico 的 rbac 资源。

[ v2.6]# k239 apply -f calico.yaml
configmap "calico-config" unchanged
secret "calico-etcd-secrets" unchanged
daemonset "calico-node" configured
deployment "calico-kube-controllers" configured
deployment "calico-policy-controller" configured
serviceaccount "calico-kube-controllers" unchanged
serviceaccount "calico-node" unchanged

提交更新

提交之后， daemonset 的 calico-node 并没有更新，现在删除 pod ，使其更新

[ v2.6]# kubectl -n kube-system get pod -o wide |grep calico
calico-kube-controllers-6768b96c5f-rdbjp   1/1       Running             0          4m        10.111.32.243   k8s-4.geotmt.com
calico-node-45lnh                          0/1       ContainerCreating   0          4h        10.111.32.241   k8s-2.geotmt.com
calico-node-49mq7                          1/1       Running             1          5h        10.111.32.243   k8s-4.geotmt.com
calico-node-m86hr                          1/1       Running             0          5h        10.111.32.244   k8s-5.geotmt.com
calico-node-mm5fz                          0/1       ContainerCreating   0          4h        10.111.32.239   k8s-1.geotmt.com
calico-node-shrfw                          1/1       Running             0          4h        10.111.32.242   k8s-3.geotmt.com
calico-node-xx8hk                          1/1       Running             0          5h        10.111.32.245   k8s-6.geotmt.com

更新后的测试

其中一个的示例，新的 calico-node 其中有两个容器。

[ v2.6]# kubectl -n kube-system get pod -o wide |grep calico |grep k8s-6
calico-node-fj4t8                          2/2       Running             0          25s       10.111.32.245   k8s-6.geotmt.com

测试 ping 其他节点的 pod 正常

bash-4.4# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
2: : <NOARP> mtu 1480 qdisc noop state DOWN qlen 1000
    link/ipip 0.0.0.0 brd 0.0.0.0
4: : <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 qdisc noqueue state UP 
    link/ether 6e:20:a3:45:42:49 brd ff:ff:ff:ff:ff:ff
    inet 10.20.235.12/32 scope global eth0
       valid_lft forever preferred_lft forever
bash-4.4# ping 10.20.15.135
PING 10.20.15.135 (10.20.15.135): 56 data bytes
64 bytes from 10.20.15.135: seq=0 ttl=62 time=1.133 ms
64 bytes from 10.20.15.135: seq=1 ttl=62 time=0.631 ms

这个版本的仍需手动添加 toleration，以便在 master 节点上部署 pod。

升级至 2.6.12 完成。

2.6.12 升级至 3.0

升级前的注意事项
- You must first upgrade to Calico v2.6.5 (or a later v2.6.x release) before you can upgrade to Calico v3.0.12. (Important: Calico v2.6.5 was a special transitional release that included changes to enable upgrade to v3.0.1+; do not skip this step!)
- If you are using the etcd datastore, you should upgrade etcd to the latest stable v3 release.

上述两条都满足。

[ net.d]# etcdctl version
etcdctl version: 3.3.11
API version: 3.3

etcd datastore upgrade steps
- Install and configure calico-upgrade
- Test the data migration and check for errors
- Migrate Calico data
- Upgrade Calico

安装配置 calico-upgrade

[ ansible]# wget https://github.com/projectcalico/calico-upgrade/releases/download/v1.0.5/calico-upgrade

[ k8s_239]# ansible-playbook install_calico-upgrade.yml

使用 dry-run 执行测试

[ calico-upgrade]# calico-upgrade dry-run --output-dir=tmp --apiconfigv1 /etc/calico/apiconfigv1.cfg --apiconfigv3 /etc/calico/apiconfigv3.cfg

执行升级

[ calico-upgrade]# calico-upgrade start --ignore-v3-data --apiconfigv1 /etc/calico/apiconfigv1.cfg --apiconfigv3 /etc/calico/apiconfigv3.cfg
Preparing reports directory
 * creating report directory if it does not exist
 * validating permissions and removing old reports
Checking Calico version is suitable for migration
 * determined Calico version of: v2.6.12
 * the v1 API data can be migrated to the v3 API
Validating conversion of v1 data to v3
 * handling FelixConfiguration (global) resource
 * handling ClusterInformation (global) resource
 * handling FelixConfiguration (per-node) resources
 * handling BGPConfiguration (global) resource
 * handling Node resources
 * handling BGPPeer (global) resources
 * handling BGPPeer (node) resources
 * handling HostEndpoint resources
 * handling IPPool resources
 * handling GlobalNetworkPolicy resources
 * handling Profile resources
 * handling WorkloadEndpoint resources
 * data conversion successful
Data conversion validated successfully
Validating the v3 datastore
 * the v3 datastore is not empty

-------------------------------------------------------------------------------

Successfully validated v1 to v3 conversion.

You are about to start the migration of Calico v1 data format to Calico v3 data
format. During this time and until the upgrade is completed Calico networking
will be paused - which means no new Calico networked endpoints can be created.
No Calico configuration should be modified using calicoctl during this time.

Type "yes" to proceed (any other input cancels): yes
Pausing Calico networking
 * successfully paused Calico networking in the v1 configuration
Calico networking is now paused - waiting for 15s
Querying current v1 snapshot and converting to v3
 * handling FelixConfiguration (global) resource
 * handling ClusterInformation (global) resource
 * handling FelixConfiguration (per-node) resources
 * handling BGPConfiguration (global) resource
 * handling Node resources
 * handling BGPPeer (global) resources
 * handling BGPPeer (node) resources
 * handling HostEndpoint resources
 * handling IPPool resources
 * handling GlobalNetworkPolicy resources
 * handling Profile resources
 * handling WorkloadEndpoint resources
 * data converted successfully
Storing v3 data
 * Storing resources in v3 format
 * success: resources stored in v3 datastore
Migrating IPAM data
 * listing and converting IPAM allocation blocks
 * listing and converting IPAM affinity blocks
 * listing IPAM handles
 * storing IPAM data in v3 format
 * IPAM data migrated successfully
Data migration from v1 to v3 successful
 * check the output for details of the migrated resources
 * continue by upgrading your calico/node versions to Calico v3.x

-------------------------------------------------------------------------------

Successfully migrated Calico v1 data to v3 format.
Follow the detailed upgrade instructions available in the release documentation
to complete the upgrade. This includes:
 * upgrading your calico/node instances and orchestrator plugins (e.g. CNI) to
   the required v3.x release
 * running ‘calico-upgrade complete‘ to complete the upgrade and resume Calico
   networking

See report(s) below for details of the migrated data.
Reports:
- name conversion: /root/calico-upgrade/calico-upgrade-report/convertednames

下载 v3.0 资源文件

[ v3.0]# wget https://docs.projectcalico.org/v3.0/getting-started/kubernetes/installation/rbac.yaml

[ v3.0]# wget https://docs.projectcalico.org/v3.0/getting-started/kubernetes/installation/hosted/calico.yaml

3.0的改变可参考 3.0release note

预先下载所需镜像

[ v3.0]# grep image calico.yaml
          image: quay.io/calico/node:v3.0.12
          image: quay.io/calico/cni:v3.0.12
          image: quay.io/calico/kube-controllers:v3.0.12

执行升级

[ v3.0]# k239 apply -f calico.yaml
configmap "calico-config" configured
secret "calico-etcd-secrets" unchanged
daemonset "calico-node" configured
deployment "calico-kube-controllers" configured
serviceaccount "calico-kube-controllers" unchanged
serviceaccount "calico-node" unchanged

这里的 pod 可以实现滚动重启，待pod 都升级完成后。

执行 calico-upgrade 命令确定升级完成

[ calico-upgrade]# calico-upgrade complete  --apiconfigv1 /etc/calico/apiconfigv1.cfg --apiconfigv3 /etc/calico/apiconfigv3.cfg

You are about to complete the upgrade process to Calico v3. At this point, the
v1 format data should have been successfully converted to v3 format, and all
calico/node instances and orchestrator plugins (e.g. CNI) should be running
Calico v3.x.

Type "yes" to proceed (any other input cancels): yes
Completing upgrade
Enabling Calico networking for v3
 * successfully resumed Calico networking in the v3 configuration (updated
   ClusterInformation)
Upgrade completed successfully

-------------------------------------------------------------------------------

Successfully completed the upgrade process.

如不执行上述命令,会有如下报错

E1225 19:56:04.837028    3281 kuberuntime_sandbox.go:54] CreatePodSandbox for pod "demo-deployment-6f4c6779b-b8zqq_default(1dd28cf0-270d-11ea-bd6c-c6a864ab864a)" failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "demo-deployment-6f4c6779b-b8zqq_default" network: Calico is currently not ready to process requests
E1225 19:56:04.837049    3281 kuberuntime_manager.go:647] createPodSandbox for pod "demo-deployment-6f4c6779b-b8zqq_default(1dd28cf0-270d-11ea-bd6c-c6a864ab864a)" failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "demo-deployment-6f4c6779b-b8zqq_default" network: Calico is currently not ready to process requests
E1225 19:56:04.837167    3281 pod_workers.go:186] Error syncing pod 1dd28cf0-270d-11ea-bd6c-c6a864ab864a ("demo-deployment-6f4c6779b-b8zqq_default(1dd28cf0-270d-11ea-bd6c-c6a864ab864a)"), skipping: failed to "CreatePodSandbox" for "demo-deployment-6f4c6779b-b8zqq_default(1dd28cf0-270d-11ea-bd6c-c6a864ab864a)" with CreatePodSandboxError: "CreatePodSandbox for pod \"demo-deployment-6f4c6779b-b8zqq_default(1dd28cf0-270d-11ea-bd6c-c6a864ab864a)\" failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod \"demo-deployment-6f4c6779b-b8zqq_default\" network: Calico is currently not ready to process requests"

升级至 3.0.12 成功。

3.0.12 升级至 3.11

根据 3.11 的 Upgrading Calico on Kubernetes 说明。升级时，只需要提交新的资源文件即可(本环境不涉及 Application Layer Policy)。

这个版本的 calico 已经可以完整支持 k8s api 的datastore，更新时要注意下载文件时是否与自己的环境契合。

本环境下载 etcd datastore 的版本。

下载资源文件

[ v3.11]# wget https://docs.projectcalico.org/v3.11/manifests/calico-etcd.yaml

# 修改其中关于 etcd 的配置
[ v3.11]# bash -x modify_calico_yaml.sh

预先下载镜像

[ v3.11]# grep image calico-etcd.yaml
          image: calico/cni:v3.11.1
          image: calico/pod2daemon-flexvol:v3.11.1
          image: calico/node:v3.11.1
          image: calico/kube-controllers:v3.11.1

提交新版本

[ v3.11]# k239 apply -f calico-etcd.yaml
secret "calico-etcd-secrets" unchanged
configmap "calico-config" configured
clusterrole "calico-kube-controllers" configured
clusterrolebinding "calico-kube-controllers" configured
clusterrole "calico-node" configured
clusterrolebinding "calico-node" configured
daemonset "calico-node" configured
serviceaccount "calico-node" unchanged
deployment "calico-kube-controllers" configured
serviceaccount "calico-kube-controllers" unchanged

验证新版本

查看新版本的 pod, 每个 pod 内只有一个容器，这个版本的将 install-cni 和 flexvol-driver(旧版本没有) 作为了 initContainers ，所以常驻的就只有一个容器了

[ ~]# k239 -n kube-system get pod -o wide |grep calico
calico-kube-controllers-85dc4fd46b-4wnmt   1/1       Running   0          1m        10.111.32.243   k8s-4.geotmt.com
calico-node-4bgkc                          1/1       Running   0          59s       10.111.32.241   k8s-2.geotmt.com
calico-node-5jg2t                          1/1       Running   0          31s       10.111.32.244   k8s-5.geotmt.com
calico-node-9fn6r                          1/1       Running   0          43s       10.111.32.245   k8s-6.geotmt.com
calico-node-9n7dn                          1/1       Running   0          1m        10.111.32.243   k8s-4.geotmt.com
calico-node-fxr46                          1/1       Running   0          1m        10.111.32.239   k8s-1.geotmt.com
calico-node-pgh5c                          1/1       Running   0          1m        10.111.32.242   k8s-3.geotmt.com

测试 pod 的跨主机通信

[ ~]# kubectl exec -it demo-deployment-6f4c6779b-b8zqq /bin/bash
bash-4.4# ping 10.20.235.12
PING 10.20.235.12 (10.20.235.12): 56 data bytes
64 bytes from 10.20.235.12: seq=0 ttl=62 time=1.232 ms
^C
--- 10.20.235.12 ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 1.232/1.232/1.232 ms
bash-4.4# ping 10.20.253.80
PING 10.20.253.80 (10.20.253.80): 56 data bytes
64 bytes from 10.20.253.80: seq=0 ttl=62 time=1.730 ms
64 bytes from 10.20.253.80: seq=1 ttl=62 time=1.385 ms
^C
--- 10.20.253.80 ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max = 1.385/1.557/1.730 ms
bash-4.4# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
2: : <NOARP> mtu 1480 qdisc noop state DOWN qlen 1000
    link/ipip 0.0.0.0 brd 0.0.0.0
4: : <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 qdisc noqueue state UP 
    link/ether fa:d1:55:42:ab:6c brd ff:ff:ff:ff:ff:ff
    inet 10.20.15.163/32 scope global eth0
       valid_lft forever preferred_lft forever

测试pod重建分配地址,成功

[ ~]# kubectl delete pod nginx-deployment-7b66d98974-2rh87
pod "nginx-deployment-7b66d98974-2rh87" deleted

[ ~]# kubectl get pod nginx-deployment-7b66d98974-nd8h7 -o wide 
NAME                                READY     STATUS    RESTARTS   AGE       IP             NODE
nginx-deployment-7b66d98974-nd8h7   1/1       Running   0          1m        10.20.253.86   k8s-4.geotmt.com

calico 3.0.12 升级至 3.11.1 成功。