Upgrading the global Cluster

This document describes how to upgrade a global cluster that runs on Immutable Infrastructure. Upgrades replace nodes with new MicroOS images managed by the Cluster API provider; in-place node upgrades are not used.

When to Use This Path

Choose this upgrade path when:

The global cluster was originally installed on Immutable Infrastructure. See Installing the global Cluster.
Your infrastructure is one of the documented providers: Huawei DCS or Huawei Cloud Stack. VMware vSphere and bare-metal support for the global cluster are planned.

For traditional-OS global clusters, use the standard upgrade path instead.

Two-Phase Upgrade Overview

Like workload clusters, the global cluster on Immutable Infrastructure follows a two-phase upgrade.

Phase 1 — Distribution Version: aligned and agnostic extensions are upgraded to the target Distribution Version. The procedure is shared with workload clusters; see Upgrading Clusters for the Phase 1 mechanics.
Phase 2 — Kubernetes and OS Image: nodes are replaced with new MicroOS images that contain the target Kubernetes version. This document focuses on Phase 2 for the global cluster.

Phase 1 Compatibility

Before starting Phase 2, verify that every workload cluster falls within the Compatible Versions matrix of the target Distribution Version. Workload clusters that are out of range must be upgraded first.

Common Prerequisites

The global cluster has completed Phase 1 (Distribution Version upgrade).
An etcd backup of the global cluster has been taken and verified.
The new MicroOS image and the matching KubeadmControlPlane and MachineDeployment versions are available on the platform's registry.
A maintenance window plan that accounts for rolling control plane replacement.

Procedure

After installation, the Cluster API controllers that manage the global cluster run on the global cluster itself. Use the global kubeconfig for the kubectl commands in this procedure.

Step 1 — Update the global Cluster Manifest

Update the Cluster API manifests of the global cluster to reference the new MicroOS image and Kubernetes version. The manifest fields to update are provider-specific.

Huawei DCS

VMware vSphere

Huawei Cloud Stack

Bare Metal

For DCS, create new immutable infrastructure templates instead of editing templates that are already referenced by running machines.

Update the control plane resources:

Create a new DCSMachineTemplate for the target image and set spec.template.spec.vmTemplateName to the MicroOS template that matches the target Kubernetes version.
Keep preserved node-local data, including /var/cpaas, in DCSIpHostnamePool.spec.pool[].persistentDisk. Do not move preserved disks back into DCSMachineTemplate.
Set KubeadmControlPlane.spec.version to the target Kubernetes version.
Point KubeadmControlPlane.spec.machineTemplate.infrastructureRef.name to the new DCSMachineTemplate.
Keep KubeadmControlPlane.spec.rolloutStrategy.rollingUpdate.maxSurge: 0 when the cluster uses pool-managed persistent disks.

Update worker node resources:

Create a new worker DCSMachineTemplate with the target vmTemplateName.
Set each MachineDeployment.spec.template.spec.version to the target Kubernetes version.
Point each MachineDeployment.spec.template.spec.infrastructureRef.name to the new worker DCSMachineTemplate.
Keep each MachineDeployment.spec.strategy.rollingUpdate.maxSurge: 0 when the worker pool uses pool-managed persistent disks.

Pool-managed persistent disks are declared on the IP pool, not on the machine template:

apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: DCSIpHostnamePool
metadata:
  name: <global-pool-name>
  namespace: cpaas-system
spec:
  pool:
    - ip: <node-ip>
      hostname: <node-hostname>
      persistentDisk:
        - slot: 0
          quantityGB: 40
          datastoreName: <datastore-name>
          path: /var/cpaas
          format: xfs
          mountOptions:
            - defaults

Use the IP pool status to confirm that preserved disks are detached from old VMs and attached to replacement VMs during the rolling replacement.

For HCS, create new immutable infrastructure templates instead of editing templates that are already referenced by running machines.

Update the control plane resources:

Create a new HCSMachineTemplate for the target image and set spec.template.spec.imageName to the MicroOS image that matches the target Kubernetes version.
Keep preserved node-local data, including /var/cpaas, in HCSMachineConfigPool.spec.configs[].persistentDisks[]. Do not move preserved disks back into HCSMachineTemplate.spec.template.spec.dataVolumes[].
Leave runtime identity fields unset in the new template, including spec.template.spec.providerID and spec.template.spec.serverId.
Set KubeadmControlPlane.spec.version to the target Kubernetes version.
Point KubeadmControlPlane.spec.machineTemplate.infrastructureRef.name to the new HCSMachineTemplate.
Keep KubeadmControlPlane.spec.rolloutStrategy.rollingUpdate.maxSurge: 0 when the control plane pool uses pool-managed persistent disks.

Update worker node resources:

Create a new worker HCSMachineTemplate with the target imageName.
Set each MachineDeployment.spec.template.spec.version to the target Kubernetes version.
Point each MachineDeployment.spec.template.spec.infrastructureRef.name to the new worker HCSMachineTemplate.
Keep each MachineDeployment.spec.strategy.rollingUpdate.maxSurge: 0 when the worker pool uses pool-managed persistent disks.

Pool-managed persistent disks are declared on the machine configuration pool, not on the machine template:

apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: HCSMachineConfigPool
metadata:
  name: <global-pool-name>
  namespace: cpaas-system
  labels:
    cluster.x-k8s.io/cluster-name: global
spec:
  configs:
    - hostname: <node-hostname>
      networks:
        - subnetName: <subnet-name>
          ipAddress: <node-ip>
      persistentDisks:
        - slot: 0
          size: 100
          type: SSD
          mountPath: /var/cpaas
          format: xfs
          mountOptions:
            - defaults
            - noatime

Do not treat HCS dataVolumes[] as preserved state during node replacement. This rolling upgrade workflow supports highly available control planes that can replace HCSMachineTemplate and KubeadmControlPlane references while HCSMachineConfigPool retains enough fixed identities and persistent disk slots for the rollout. Single-control-plane HCS clusters, including a global cluster with one control plane node, are not supported by this rolling upgrade workflow. Use an alternative documented procedure, recreate the control plane with immutable templates, or consult the out-of-band migration guide or support before proceeding.

Step 2 — Apply the Updated Manifest

Apply the updated manifest against the global cluster.

kubectl --kubeconfig <global-kubeconfig> apply -f <updated-manifest>

The Cluster API provider begins replacing control plane and worker nodes by using the new image. When maxSurge: 0 is set, each old node is drained and deleted before its replacement can reuse the same fixed identity, IP address, or preserved disk.

Step 3 — Monitor the Rolling Replacement

Watch the rolling replacement until all control plane and worker nodes have been replaced.

kubectl --kubeconfig <global-kubeconfig> get machines -A -o wide
kubectl --kubeconfig <global-kubeconfig> get kubeadmcontrolplane -A

The upgrade is complete when every Machine reports the new Kubernetes version and Phase: Running, and the KubeadmControlPlane reports Ready: True against the new version.

Verification

After the rolling replacement finishes, verify that the upgraded global cluster is healthy.

kubectl --kubeconfig <global-kubeconfig> get nodes -o wide
kubectl --kubeconfig <global-kubeconfig> get clusterversionshadow -o yaml
kubectl --kubeconfig <global-kubeconfig> get pods -n cpaas-system

All nodes must report the new Kubernetes version, the ClusterVersionShadow must reflect the target Distribution Version, and core platform pods must be Running.

Rollback Considerations

Rollback after a partial Phase 2 upgrade is provider-specific. In general:

If the upgrade has not yet replaced any control plane node, revert the manifest to the previous image and reapply. If control plane nodes have been replaced, restore from the etcd backup taken before starting the upgrade, then revert the manifest.

Huawei DCS

VMware vSphere

Huawei Cloud Stack

Bare Metal

For DCS clusters that use pool-managed persistent disks, confirm disk state before rollback:

First, check DCSIpHostnamePool.status.persistentDiskStatus before deleting or recreating machines. Do not delete retained DCS volumes that are listed in DCSIpHostnamePool.spec.pool[].persistentDisk.

Keep maxSurge: 0 while reverting to the previous machine templates so replacement happens one node at a time. If the control plane was already replaced and cluster state is inconsistent, restore from the verified etcd backup before reapplying the previous manifest.

For HCS, rollback depends on platform and etcd backups. Node-local data on HCS dataVolumes[] is not a reliable rollback source because node replacement may delete the old VM and attached volumes. Data declared in HCSMachineConfigPool.spec.configs[].persistentDisks[] is reattached during replacement. If the control plane was already replaced, restore from the verified etcd backup, then reapply the previous manifest with the previous HCSMachineTemplate references and Kubernetes versions.

Next Steps

Upgrade workload clusters to the same Distribution Version: see Upgrading Clusters.
Review machine configuration changes that ship with the new image: see Machine Configuration.

#Upgrading the global Cluster

#TOC

#When to Use This Path

#Two-Phase Upgrade Overview

#Common Prerequisites

#Procedure

#Step 1 — Update the global Cluster Manifest

#Step 2 — Apply the Updated Manifest

#Step 3 — Monitor the Rolling Replacement

#Verification

#Rollback Considerations

#Next Steps