Kubernetes 滚动升级
Kubernetes Rolling Upgrade
背景
Kubernetes 是一个很好的容器应用集群管理工具,尤其是采用ReplicaSet这种自动维护应用生命周期事件的对象后,将容器应用管理的技巧发挥得淋漓尽致。在容器应用管理的诸多特性中,有一个特性是最能体现Kubernetes强大的集群应用管理能力的,那就是滚动升级。
滚动升级的精髓在于升级过程中依然能够保持服务的连续性,使外界对于升级的过程是无感知的。整个过程中会有三个状态,全部旧实例,新旧实例皆有,全部新实例。旧实例个数逐渐减少,新实例个数逐渐增加,最终达到旧实例个数为0,新实例个数达到理想的目标值。
kubernetes 滚动升级
Kubernetes 中采用ReplicaSet(简称RS)来管理Pod实例。如果当前集群中的Pod实例数少于目标值,RS 会拉起新的Pod,反之,则根据策略删除多余的Pod。Deployment正是利用了这样的特性,通过控制两个RS里面的Pod,从而实现升级。
滚动升级是一种平滑过渡式的升级,在升级过程中,服务仍然可用。这是kubernetes作为应用服务化管理的关键一步。服务无处不在,并且按需使用。这是云计算的初衷,对于PaaS平台来说,应用抽象成服务,遍布整个集群,为应用提供随时随地可用的服务是PaaS的终极使命。
1.ReplicaSet
关于RS的概念大家都很清楚了,我们来看看在k8s源码中的RS。
type ReplicaSetController struct { kubeClient clientset.Interface podControl controller.PodControlInterface // internalPodInformer is used to hold a personal informer. If we're using // a normal shared informer, then the informer will be started for us. If // we have a personal informer, we must start it ourselves. If you start // the controller using NewReplicationManager(passing SharedInformer), this // will be null internalPodInformer framework.SharedIndexInformer // A ReplicaSet is temporarily suspended after creating/deleting these many replicas. // It resumes normal action after observing the watch events for them. burstReplicas int // To allow injection of syncReplicaSet for testing. syncHandler func(rsKey string) error // A TTLCache of pod creates/deletes each rc expects to see. expectations *controller.UIDTrackingControllerExpectations // A store of ReplicaSets, populated by the rsController rsStore cache.StoreToReplicaSetLister // Watches changes to all ReplicaSets rsController *framework.Controller // A store of pods, populated by the podController podStore cache.StoreToPodLister // Watches changes to all pods podController framework.ControllerInterface // podStoreSynced returns true if the pod store has been synced at least once. // Added as a member to the struct to allow injection for testing. podStoreSynced func() bool lookupCache *controller.MatchingCache // Controllers that need to be synced queue *workqueue.Type // garbageCollectorEnabled denotes if the garbage collector is enabled. RC // manager behaves differently if GC is enabled. garbageCollectorEnabled bool }
这个结构体位于pkg/controllers/replicaset,这里我们可以看出,RS最主要的几个对象,一个是针对Pod的操作对象-podControl.看到这个名字就知道,这个对象是控制RS下面的Pod的生命周期的,我们看看这个PodControl所包含的方法。
// PodControlInterface is an interface that knows how to add or delete pods // created as an interface to allow testing. type PodControlInterface interface { // CreatePods creates new pods according to the spec. CreatePods(namespace string, template *api.PodTemplateSpec, object runtime.Object) error // CreatePodsOnNode creates a new pod accorting to the spec on the specified node. CreatePodsOnNode(nodeName, namespace string, template *api.PodTemplateSpec, object runtime.Object) error // CreatePodsWithControllerRef creates new pods according to the spec, and sets object as the pod's controller. CreatePodsWithControllerRef(namespace string, template *api.PodTemplateSpec, object runtime.Object, controllerRef *api.OwnerReference) error // DeletePod deletes the pod identified by podID. DeletePod(namespace string, podID string, object runtime.Object) error // PatchPod patches the pod. PatchPod(namespace, name string, data []byte) error }
这里我们可以看到,RS可以完全控制Pod.这里有两个watch,rsController和podController,他们分别负责watch ETCD中RS和Pod的变化。这里一个重要的对象不得不提,那就是syncHandler,这个是所有Controller都有的对象。每一个控制器通过Watch来监视ETCD中的变化,使用sync的方式来同步这些对象的状态,注意这个Handler只是一个委托,实际真正的Handler在创建控制器的时候指定。这种模式不仅仅适用于RS,其他控制器亦如此。
下面的逻辑更加清晰地说明了watch的逻辑。
rsc.rsStore.Store, rsc.rsController = framework.NewInformer( &cache.ListWatch{ ListFunc: func(options api.ListOptions) (runtime.Object, error) { return rsc.kubeClient.Extensions().ReplicaSets(api.NamespaceAll).List(options) }, WatchFunc: func(options api.ListOptions) (watch.Interface, error) { return rsc.kubeClient.Extensions().ReplicaSets(api.NamespaceAll).Watch(options) }, }, &extensions.ReplicaSet{}, // TODO: Can we have much longer period here? FullControllerResyncPeriod, framework.ResourceEventHandlerFuncs{ AddFunc: rsc.enqueueReplicaSet, UpdateFunc: rsc.updateRS, // This will enter the sync loop and no-op, because the replica set has been deleted from the store. // Note that deleting a replica set immediately after scaling it to 0 will not work. The recommended // way of achieving this is by performing a `stop` operation on the replica set. DeleteFunc: rsc.enqueueReplicaSet, }, )
每次Watch到ETCD中的对象的变化,采取相应的措施,具体来说就是放入队列,更新或者取出队列。对于Pod来说,也有相应的处理。
podInformer.AddEventHandler(framework.ResourceEventHandlerFuncs{ AddFunc: rsc.addPod, // This invokes the ReplicaSet for every pod change, eg: host assignment. Though this might seem like // overkill the most frequent pod update is status, and the associated ReplicaSet will only list from // local storage, so it should be ok. UpdateFunc: rsc.updatePod, DeleteFunc: rsc.deletePod, })
RS基本的内容就这些,在RS的上层是Deployment,这个对象也是一个控制器。
// DeploymentController is responsible for synchronizing Deployment objects stored // in the system with actual running replica sets and pods. type DeploymentController struct { client clientset.Interface eventRecorder record.EventRecorder // To allow injection of syncDeployment for testing. syncHandler func(dKey string) error // A store of deployments, populated by the dController dStore cache.StoreToDeploymentLister // Watches changes to all deployments dController *framework.Controller // A store of ReplicaSets, populated by the rsController rsStore cache.StoreToReplicaSetLister // Watches changes to all ReplicaSets rsController *framework.Controller // A store of pods, populated by the podController podStore cache.StoreToPodLister // Watches changes to all pods podController *framework.Controller // dStoreSynced returns true if the Deployment store has been synced at least once. // Added as a member to the struct to allow injection for testing. dStoreSynced func() bool // rsStoreSynced returns true if the ReplicaSet store has been synced at least once. // Added as a member to the struct to allow injection for testing. rsStoreSynced func() bool // podStoreSynced returns true if the pod store has been synced at least once. // Added as a member to the struct to allow injection for testing. podStoreSynced func() bool // Deployments that need to be synced queue workqueue.RateLimitingInterface }
对于DeploymentController来说,需要监听Deployment,RS和Pod。从Controller的创建过程中可以看出来。
dc.dStore.Store, dc.dController = framework.NewInformer( &cache.ListWatch{ ListFunc: func(options api.ListOptions) (runtime.Object, error) { return dc.client.Extensions().Deployments(api.NamespaceAll).List(options) }, WatchFunc: func(options api.ListOptions) (watch.Interface, error) { return dc.client.Extensions().Deployments(api.NamespaceAll).Watch(options) }, }, &extensions.Deployment{}, FullDeploymentResyncPeriod, framework.ResourceEventHandlerFuncs{ AddFunc: dc.addDeploymentNotification, UpdateFunc: dc.updateDeploymentNotification, // This will enter the sync loop and no-op, because the deployment has been deleted from the store. DeleteFunc: dc.deleteDeploymentNotification, }, ) dc.rsStore.Store, dc.rsController = framework.NewInformer( &cache.ListWatch{ ListFunc: func(options api.ListOptions) (runtime.Object, error) { return dc.client.Extensions().ReplicaSets(api.NamespaceAll).List(options) }, WatchFunc: func(options api.ListOptions) (watch.Interface, error) { return dc.client.Extensions().ReplicaSets(api.NamespaceAll).Watch(options) }, }, &extensions.ReplicaSet{}, resyncPeriod(), framework.ResourceEventHandlerFuncs{ AddFunc: dc.addReplicaSet, UpdateFunc: dc.updateReplicaSet, DeleteFunc: dc.deleteReplicaSet, }, ) dc.podStore.Indexer, dc.podController = framework.NewIndexerInformer( &cache.ListWatch{ ListFunc: func(options api.ListOptions) (runtime.Object, error) { return dc.client.Core().Pods(api.NamespaceAll).List(options) }, WatchFunc: func(options api.ListOptions) (watch.Interface, error) { return dc.client.Core().Pods(api.NamespaceAll).Watch(options) }, }, &api.Pod{}, resyncPeriod(), framework.ResourceEventHandlerFuncs{ AddFunc: dc.addPod, UpdateFunc: dc.updatePod, DeleteFunc: dc.deletePod, }, cache.Indexers{cache.NamespaceIndex: cache.MetaNamespaceIndexFunc}, ) dc.syncHandler = dc.syncDeployment dc.dStoreSynced = dc.dController.HasSynced dc.rsStoreSynced = dc.rsController.HasSynced dc.podStoreSynced = dc.podController.HasSynced
这里最核心的就是syncDeployment,因为这里面有rollingUpdate和rollback的实现。在这里如果watch到某个Deployment对象的RollbackTo.Revision部位nil,则执行rollingbach。这个Revision是版本号,注意虽然是回滚,但k8s内部记录的版本号永远是增长的。
有人会好奇,rollback是怎么做到的,其实原理很简单,k8s记录了各个版本的PodTemplate,把旧的PodTemplate覆盖新的Template即可。
对于K8S来说,升级有两种方式,一种是重新构建,一种是滚动升级。
switch d.Spec.Strategy.Type { case extensions.RecreateDeploymentStrategyType: return dc.rolloutRecreate(d) case extensions.RollingUpdateDeploymentStrategyType: return dc.rolloutRolling(d) }
这个rolloutRolling里面包含了所有的秘密,这里我们可以看到。
func (dc *DeploymentController) rolloutRolling(deployment *extensions.Deployment) error { newRS, oldRSs, err := dc.getAllReplicaSetsAndSyncRevision(deployment, true) if err != nil { return err } allRSs := append(oldRSs, newRS) // Scale up, if we can. scaledUp, err := dc.reconcileNewReplicaSet(allRSs, newRS, deployment) if err != nil { return err } if scaledUp { // Update DeploymentStatus return dc.updateDeploymentStatus(allRSs, newRS, deployment) } // Scale down, if we can. scaledDown, err := dc.reconcileOldReplicaSets(allRSs, controller.FilterActiveReplicaSets(oldRSs), newRS, deployment) if err != nil { return err } if scaledDown { // Update DeploymentStatus return dc.updateDeploymentStatus(allRSs, newRS, deployment) } dc.cleanupDeployment(oldRSs, deployment) // Sync deployment status return dc.syncDeploymentStatus(allRSs, newRS, deployment) }
这里做了如下几件事:
1. 查找新的RS和旧的RS,并计算出新的Revision(这是Revision的最大值);
2. 对新的RS进行扩容操作;
3. 对旧的RS进行缩容操作;
4. 完成之后,删掉旧的RS;
5. 通过Deployment状态到etcd;
至此,我们知道了滚动升级在kubernetes中的原理。其实在传统的负载均衡应用中,滚动升��的做法很类似,但是在容器环境中,我们有RS,通过这种方法更为便捷。
Kubernetes 的详细介绍:请点这里
Kubernetes 的下载地址:请点这里