kubernetes 升级指南
- 持续更新,不再单开文章。
- 操作系统centos7
- 文章的命令根据自身环境修改,别照抄。这是我为了写blog方便从官方copy的
- 版本从1.13.12逐步升级到1.17.5
一 、升级准备
- 确保集群是 kubeadm 搭建的
- 确保当前集群已经完成 HA(多个 master 节点)
- 确保做好了集群备份
二、升级注意事项
- 升级后所有集群组件 Pod 会重启(hash 变更)
- 升级时 kubeadm 版本必须大于或等于目标版本
- 升级期间所有 kube-proxy 组件会有一次全节点滚动更新
升级只支持小版本进行,不支持跨版本升级(You only can upgrade from one MINOR version to the next MINOR version, or between PATCH versions of the same MINOR. That is, you cannot skip MINOR versions when you upgrade. For example, you can upgrade from 1.y to 1.y+1, but not from 1.y to 1.y+2.)
关于升级版本问题,意思是可以从1.16.x 升级到1.16.y,或者升级到1.17.x。但是不支持从1.16直接升级到1.18
三、升级master
3.1、升级 kubeadm、kubectl
首先将 kubeadm 和 kubectl 升级到大于目标版本
yum versionlock delete kubectl kubeadm yum install -y kubeadm-1.17.5 --disableexcludes=kubernetes yum versionlock kubeadm kubelet
versionlock 你可以选择不用,因为我的服务器会不时的update,避免别人误升级(就是怕有人手贱)
3.2、升级前准备
3.2.1、配置修改
基本上集群都要自定义的,默认配置不是说不能用,而是不太符合生产环境。
文章最后会附上所有版本升级需要的kubeadm-config
3.2.2、节点驱逐
如果master 节点有作为node跑的pod,则需要执行以下命令驱逐这些 pod 并使节点进入维护模式(禁止调度)。
# 将 cp-node-name 换成 Master 节点名称 kubectl drain cp-node-name --ignore-daemonsets
3.2.3、查看升级计划
通过以下命令查看升级计划;升级计划中列出了升级期间要升级的所有组件以及相关警告,一定要仔细查看。
[]# kubeadm upgrade plan [upgrade/config] Making sure the configuration is correct: [upgrade/config] Reading configuration from the cluster... [upgrade/config] FYI: You can look at this config file with ‘kubectl -n kube-system get cm kubeadm-config -oyaml‘ [preflight] Running pre-flight checks. [upgrade] Running cluster health checks [upgrade] Fetching available versions to upgrade to [upgrade/versions] Cluster version: v1.17.5 [upgrade/versions] kubeadm version: v1.18.2 [upgrade/versions] Latest stable version: v1.18.2 [upgrade/versions] Latest stable version: v1.18.2 [upgrade/versions] Latest version in the v1.17 series: v1.17.5 [upgrade/versions] Latest version in the v1.17 series: v1.17.5 Components that must be upgraded manually after you have upgraded the control plane with ‘kubeadm upgrade apply‘: COMPONENT CURRENT AVAILABLE Kubelet 1 x v1.17.5 v1.18.2 Upgrade to the latest stable version: COMPONENT CURRENT AVAILABLE API Server v1.17.5 v1.18.2 Controller Manager v1.17.5 v1.18.2 Scheduler v1.17.5 v1.18.2 Kube Proxy v1.17.5 v1.18.2 CoreDNS 1.6.5 1.6.7 Etcd 3.4.3 3.4.3-0 You can now apply the upgrade by executing the following command: kubeadm upgrade apply v1.18.2 _____________________________________________________________________
3.3、执行升级
如果你的etcd 集群不是kubeadm创建的, 需要先手动升级etcd集群。然后再执行后面的步骤
kubeadm upgrade apply v1.17.5 --config /etc/kubernetes/kubeadm.yaml
3.4、升级 kubelet
在单个 master 上升级完成后,只会升级本节点的 master 相关组件和全节点的 kube-proxy 组件;确定没问题后再更新kubelet
解除驱逐
# replace x in 1.17.x-0 with the latest patch version yum versionlock delete kubelet yum install -y kubelet-1.17.5 --disableexcludes=kubernetes yum versionlock kubelet
更新完成后执行 并等待启动成功
systemctl daemon-reload systemctl restart kubelet
别忘了解除当前节点的维护模式(uncordon)
# replace <cp-node-name> with the name of your control plane node kubectl uncordon <cp-node-name>
3.5、升级其他 Master
步骤同第一个master差不多,只是把kubeadm upgrade plan 替换成 kubeadm upgrade node
因为apiserver 等组件配置已经在升级第一个 master 时上传到了集群的 configMap 中,所以事实上其他 master 节点只是正常拉取然后重启相关组件既可;这一步同样会输出详细日志,可以仔细观察进度,最后不要忘记升级之前先进入维护模式,升级完成后重新安装 kubelet 并关闭节点维护模式。
四、升级 Node
node 节点的升级在升级完 master 节点以后不需要什么特殊操作,唯一需要升级的就是 kubelet 组件;首先在 node 节点执行 kubeadm upgrade node 命令,该命令会拉取集群内的 kubelet 配置文件,然后重新安装 kubelet 重启既可;同样升级 node 节点时不要忘记开启维护模式。针对于 CNI 组件请按需手动升级,并且确认好 CNI 组件的兼容版本。
五、验证集群
查看集群node都是ready,同时版本号也是你升级后的
kubectl get nodes
从错误状态中恢复
如果 kubeadm upgrade 执行过程中出现错误且未曾回滚,例如执行过程中意外关机,您可以再次执行 kubeadm upgrade。该命令是 幂等 的,并将最终保证您能够达到最终期望的升级结果。
从失败状态中恢复时,请执行 kubeadm upgrade --force 命令,注意要使用集群的当前版本号。
工作过程
在第一个 master 节点上,kubeadm upgrade apply 执行了如下操作:
检查集群是否处于可升级的状态:
API Server 可以调用
所有的节点处于 Ready 装填
master 节点处于 healthy 状态
检验是否可以从当前版本升级到目标版本
确保 master 节点所需要的镜像可以被抓取到节点上
升级 master 节点的组件,(如果碰到问题,则回滚)
应用新的 kube-dns 和 kube-proxy 的 manifests 文件,并确保需要的 RBAC 规则被创建
如果证书在 180 天内将要过期,则为 API Server 创建新的证书文件,并备份旧的文件
在其他 master 节点上,kubeadm upgrade node 执行了如下操作:
从集群中抓取 kubeadm 的配置信息 ClusterConfiguration
备份 kube-apiserver 的证书
升级 master 节点上静态组件的 manifest 信息
升级 master 节点上 kubelet 的配置信息
在所有的 worker 节点上,kubeadm upgrade node 执行了如下操作:
从集群中抓取 kubeadm 的配置信息 ClusterConfiguration
升级 worker 节点上 kubelet 的配置信息
1.16.8 升级到1.17.5
配置kubeadm-config文件
apiVersion: kubeadm.k8s.io/v1beta2 kind: InitConfiguration bootstrapTokens: - groups: - system:bootstrappers:kubeadm:default-node-token token: abcdef.0123456789abcdef ttl: "0" usages: - signing - authentication --- imageRepository: harbor.foxchan.com/google_containers apiVersion: kubeadm.k8s.io/v1beta2 kind: ClusterConfiguration kubernetesVersion: v1.17.5 certificatesDir: /etc/kubernetes/pki clusterName: kubernetes controlPlaneEndpoint: k8s.foxchan.com:8443 etcd: external: endpoints: - http://172.16.242.12:2379 - http://172.16.242.16:2379 - http://172.16.242.44:2379 networking: dnsDomain: cluster.local podSubnet: 10.244.0.0/16 serviceSubnet: 10.96.0.0/12 apiServer: extraArgs: v: "2" logtostderr: "false" log-dir: "/var/log/kubernetes" extraVolumes: - name: "k8s-log" hostPath: "/var/log/kubernetes" mountPath: "/var/log/kubernetes" pathType: "DirectoryOrCreate" - name: "timezone" hostPath: "/etc/localtime" mountPath: "/etc/localtime" readOnly: true pathType: "File" timeoutForControlPlane: 4m0s certSANs: - k8s.foxchan.com - "172.16.242.16" - "172.16.242.12" - "172.16.242.17" controllerManager: extraArgs: address: 0.0.0.0 experimental-cluster-signing-duration: "87600h" v: "2" logtostderr: "false" log-dir: "/var/log/kubernetes" extraVolumes: - name: "k8s-log" hostPath: "/var/log/kubernetes" mountPath: "/var/log/kubernetes" pathType: "DirectoryOrCreate" - name: "timezone" hostPath: "/etc/localtime" mountPath: "/etc/localtime" readOnly: true pathType: "File" scheduler: extraArgs: address: 0.0.0.0 v: "2" logtostderr: "false" log-dir: "/var/log/kubernetes" extraVolumes: - name: "k8s-log" hostPath: "/var/log/kubernetes" mountPath: "/var/log/kubernetes" pathType: "DirectoryOrCreate" - name: "timezone" hostPath: "/etc/localtime" mountPath: "/etc/localtime" readOnly: true pathType: "File" --- apiVersion: kubelet.config.k8s.io/v1beta1 kind: KubeletConfiguration failSwapOn: false cgroupDriver: systemd rotateCertificates: true # 一些驱逐阀值,具体自行查文档修改 evictionHard: "imagefs.available": "8%" "memory.available": "256Mi" "nodefs.available": "8%" "nodefs.inodesFree": "5%" --- apiVersion: kubeproxy.config.k8s.io/v1alpha1 kind: KubeProxyConfiguration # kube-proxy specific options here clusterCIDR: "10.244.0.0/16" # 启用 ipvs 模式 mode: "ipvs" ipvs: minSyncPeriod: 5s syncPeriod: 5s # ipvs 负载策略 scheduler: "wrr"
升级日志
从1.17 开始 通过upgrade 修改集群config 是不被推荐的,尽管还能用
kube-proxy config 和 kubelet config 已经不再 kubeadm-config 配置
kube-proxy 和kubelet config 需要些config.yaml ,来替换系统的 configmap
1.17 修复了kubeadm alpha certs check-expiration 查看证书有效期, etcd组件是外部无法查看的问题
官方issue: https://github.com/kubernetes/kubeadm/issues/1850
1.15.6升级到1.16.8
配置文件如下
apiVersion: kubeadm.k8s.io/v1beta2 kind: InitConfiguration bootstrapTokens: - groups: - system:bootstrappers:kubeadm:default-node-token token: abcdef.0123456789abcdef ttl: "0" usages: - signing - authentication --- imageRepository: harbor.foxchan.com/google_containers apiVersion: kubeadm.k8s.io/v1beta2 kind: ClusterConfiguration kubernetesVersion: v1.16.8 certificatesDir: /etc/kubernetes/pki clusterName: kubernetes controlPlaneEndpoint: k8s.foxchan.com:8443 etcd: external: endpoints: - http://172.16.242.12:2379 - http://172.16.242.16:2379 - http://172.16.242.44:2379 networking: dnsDomain: cluster.local podSubnet: 10.244.0.0/16 serviceSubnet: 10.96.0.0/12 apiServer: extraArgs: v: "2" logtostderr: "false" log-dir: "/var/log/kubernetes" authorization-mode: Node,RBAC extraVolumes: - name: "k8s-log" hostPath: "/var/log/kubernetes" mountPath: "/var/log/kubernetes" pathType: "DirectoryOrCreate" - name: "timezone" hostPath: "/etc/localtime" mountPath: "/etc/localtime" readOnly: true pathType: "File" timeoutForControlPlane: 4m0s certSANs: - k8s.foxchan.com - "172.16.242.16" - "172.16.242.12" - "172.16.242.17" controllerManager: extraArgs: address: 0.0.0.0 experimental-cluster-signing-duration: "87600h" v: "2" logtostderr: "false" log-dir: "/var/log/kubernetes" extraVolumes: - name: "k8s-log" hostPath: "/var/log/kubernetes" mountPath: "/var/log/kubernetes" pathType: "DirectoryOrCreate" - name: "timezone" hostPath: "/etc/localtime" mountPath: "/etc/localtime" readOnly: true pathType: "File" scheduler: extraArgs: address: 0.0.0.0 v: "2" logtostderr: "false" log-dir: "/var/log/kubernetes" extraVolumes: - name: "k8s-log" hostPath: "/var/log/kubernetes" mountPath: "/var/log/kubernetes" pathType: "DirectoryOrCreate" - name: "timezone" hostPath: "/etc/localtime" mountPath: "/etc/localtime" readOnly: true pathType: "File" --- apiVersion: kubelet.config.k8s.io/v1beta1 kind: KubeletConfiguration failSwapOn: false cgroupDriver: systemd rotateCertificates: true # 一些驱逐阀值,具体自行查文档修改 evictionHard: "imagefs.available": "8%" "memory.available": "256Mi" "nodefs.available": "8%" "nodefs.inodesFree": "5%" --- apiVersion: kubeproxy.config.k8s.io/v1alpha1 kind: KubeProxyConfiguration # kube-proxy specific options here clusterCIDR: "10.244.0.0/16" # 启用 ipvs 模式 mode: "ipvs" ipvs: minSyncPeriod: 5s syncPeriod: 5s # ipvs 负载策略 scheduler: "wrr"
升级日志
出现error
官方issue:https://github.com/kubernetes/kubernetes/issues/82889
主要是coredns 插件从proxy替换为forward
可以忽略报错,在升级集群的时候 会自动替换配置
kubeadm upgrade plan --config kubeadm1.16-config.yaml --ignore-preflight-errors=CoreDNSUnsupportedPlugins
或者修改cm ,把proxy 替换为forward
kubectl -n kube-system get cm coredns -oyaml
在升级完集群后,
node not ready.报错:
plugin flannel does not support config version
添加"cniVersion":"0.3.1" 到 /etc/cni/net.d/10-flannel.conflist
{ "name": "cbr0", "cniVersion": "0.3.1", "plugins": [ { "type": "flannel", "delegate": { "hairpinMode": true, "isDefaultGateway": true } }, { "type": "portmap", "capabilities": { "portMappings": true } } ] }
1.16版本kubernetes需要在配置文件中指定cni版本
发现 执行命令查看cs 报unknown,
官方issue https://github.com/kubernetes/kubernetes/issues/83024
在1.17版本修复
[]# kubectl get cs NAME AGE scheduler <unknown> controller-manager <unknown> etcd-0 <unknown> etcd-2 <unknown> etcd-1 <unknown>