Upgrading the global Cluster
This document describes how to upgrade a global cluster that runs on Immutable Infrastructure. Upgrades replace nodes with new MicroOS images managed by the Cluster API provider; in-place node upgrades are not used.
TOC
When to Use This PathTwo-Phase Upgrade OverviewCommon PrerequisitesProcedureStep 1 — Update the global Cluster ManifestStep 2 — Apply the Updated ManifestStep 3 — Monitor the Rolling ReplacementVerificationRollback ConsiderationsNext StepsWhen to Use This Path
Choose this upgrade path when:
- The
globalcluster was originally installed on Immutable Infrastructure. See Installing the global Cluster. - Your infrastructure is one of the documented providers: Huawei DCS or Huawei Cloud Stack. VMware vSphere and bare-metal support for the
globalcluster are planned.
For traditional-OS global clusters, use the standard upgrade path instead.
Two-Phase Upgrade Overview
Like workload clusters, the global cluster on Immutable Infrastructure follows a two-phase upgrade.
- Phase 1 — Distribution Version: aligned and agnostic extensions are upgraded to the target Distribution Version. The procedure is shared with workload clusters; see Upgrading Clusters for the Phase 1 mechanics.
- Phase 2 — Kubernetes and OS Image: nodes are replaced with new MicroOS images that contain the target Kubernetes version. This document focuses on Phase 2 for the
globalcluster.
Before starting Phase 2, verify that every workload cluster falls within the Compatible Versions matrix of the target Distribution Version. Workload clusters that are out of range must be upgraded first.
Common Prerequisites
- The
globalcluster has completed Phase 1 (Distribution Version upgrade). - An etcd backup of the
globalcluster has been taken and verified. - The new MicroOS image and the matching
KubeadmControlPlaneandMachineDeploymentversions are available on the platform's registry. - A maintenance window plan that accounts for rolling control plane replacement.
Procedure
After installation, the Cluster API controllers that manage the global cluster run on the global cluster itself. Use the global kubeconfig for the kubectl commands in this procedure.
Step 1 — Update the global Cluster Manifest
Update the Cluster API manifests of the global cluster to reference the new MicroOS image and Kubernetes version. The manifest fields to update are provider-specific.
For DCS, create new immutable infrastructure templates instead of editing templates that are already referenced by running machines.
Update the control plane resources:
- Create a new
DCSMachineTemplatefor the target image and setspec.template.spec.vmTemplateNameto the MicroOS template that matches the target Kubernetes version. - Keep preserved node-local data, including
/var/cpaas, inDCSIpHostnamePool.spec.pool[].persistentDisk. Do not move preserved disks back intoDCSMachineTemplate. - Set
KubeadmControlPlane.spec.versionto the target Kubernetes version. - Point
KubeadmControlPlane.spec.machineTemplate.infrastructureRef.nameto the newDCSMachineTemplate. - Keep
KubeadmControlPlane.spec.rolloutStrategy.rollingUpdate.maxSurge: 0when the cluster uses pool-managed persistent disks.
Update worker node resources:
- Create a new worker
DCSMachineTemplatewith the targetvmTemplateName. - Set each
MachineDeployment.spec.template.spec.versionto the target Kubernetes version. - Point each
MachineDeployment.spec.template.spec.infrastructureRef.nameto the new workerDCSMachineTemplate. - Keep each
MachineDeployment.spec.strategy.rollingUpdate.maxSurge: 0when the worker pool uses pool-managed persistent disks.
Pool-managed persistent disks are declared on the IP pool, not on the machine template:
Use the IP pool status to confirm that preserved disks are detached from old VMs and attached to replacement VMs during the rolling replacement.
Step 2 — Apply the Updated Manifest
Apply the updated manifest against the global cluster.
The Cluster API provider begins replacing control plane and worker nodes by using the new image. When maxSurge: 0 is set, each old node is drained and deleted before its replacement can reuse the same fixed identity, IP address, or preserved disk.
Step 3 — Monitor the Rolling Replacement
Watch the rolling replacement until all control plane and worker nodes have been replaced.
The upgrade is complete when every Machine reports the new Kubernetes version and Phase: Running, and the KubeadmControlPlane reports Ready: True against the new version.
Verification
After the rolling replacement finishes, verify that the upgraded global cluster is healthy.
All nodes must report the new Kubernetes version, the ClusterVersionShadow must reflect the target Distribution Version, and core platform pods must be Running.
Rollback Considerations
Rollback after a partial Phase 2 upgrade is provider-specific. In general:
If the upgrade has not yet replaced any control plane node, revert the manifest to the previous image and reapply. If control plane nodes have been replaced, restore from the etcd backup taken before starting the upgrade, then revert the manifest.
For DCS clusters that use pool-managed persistent disks, confirm disk state before rollback:
First, check DCSIpHostnamePool.status.persistentDiskStatus before deleting or recreating machines. Do not delete retained DCS volumes that are listed in DCSIpHostnamePool.spec.pool[].persistentDisk.
Keep maxSurge: 0 while reverting to the previous machine templates so replacement happens one node at a time. If the control plane was already replaced and cluster state is inconsistent, restore from the verified etcd backup before reapplying the previous manifest.
Next Steps
- Upgrade workload clusters to the same Distribution Version: see Upgrading Clusters.
- Review machine configuration changes that ship with the new image: see Machine Configuration.