Upgrading Kubernetes on Huawei DCS

This guide explains how to complete Phase 2 of the upgrade workflow for clusters on Huawei DCS. Before you upgrade Kubernetes, complete the Distribution Version upgrade described in Upgrading Clusters.

INFO

Where this page fits in the full ACP upgrade flow

This page covers only the Kubernetes step of the upgrade. The full ACP upgrade flow — including upgrade artifact synchronization, ACP Core upgrade through CVO, Aligned plugin upgrades, and Agnostic plugin upgrades from Marketplace — is documented in the ACP product documentation. Complete those steps before you start the Kubernetes step on this page:

Use this page when the same cluster runs on an immutable operating system, because the Kubernetes step on immutable OS replaces nodes from a new MicroOS-based VM template rather than upgrading binaries in place.

INFO

Version

DCS provider v1.0.16 is the first release that supports pool-managed persistent disks.

INFO

Existing Cluster Migration

If your cluster runs ACP v4.2.1 or later and you are moving to DCS provider v1.0.16 or later, complete the migration procedure in Migrate Existing Huawei DCS Clusters to Pool-Managed Persistent Disks before you rely on upgrade-time disk preservation.

Upgrade Sequence

Upgrade DCS clusters in the following order:

(Prerequisite) Upgrade the ACP platform on the management cluster first. This brings the cluster-api-provider-dcs controller and the related CAPI components (core, KubeadmControlPlane provider, bootstrap provider) to versions that understand the new schema. Trigger workload-cluster upgrades only after the management-side controllers have rolled out and become Ready.
Upgrade the Distribution Version (Aligned Extensions) on the workload cluster. See Upgrading Distribution Version.
Upgrade the control plane Kubernetes version.
Upgrade worker nodes to the target Kubernetes version.

Cluster API orchestrates rolling updates with built-in safety mechanisms to reduce service disruption.

WARNING

Skipping step 1 risks two failure modes: the old controller silently ignores new schema fields written to DCSIpHostnamePool / DCSMachineTemplate; or a controller image swap mid-rollout interrupts persistent-disk state-machine progression. Always settle the management-side upgrade before touching workload rollout.

Prerequisites

Before you start, ensure all of the following prerequisites are met:

The Distribution Version upgrade is complete
The control plane is reachable
All nodes are healthy and in Ready state
The IP Pool has sufficient capacity for rolling updates
The VM template supports the target Kubernetes version. See OS Support Matrix for version mapping
The target Kubernetes version is compatible with your workloads and add-ons
DCS VM templates are 4.2.1 or later if you use pool-managed persistent disks, because safe shutdown and disk detach depend on guest tools
If you rely on pool-managed persistent disks, keep KubeadmControlPlane.spec.rolloutStrategy.rollingUpdate.maxSurge = 0 and each MachineDeployment.spec.strategy.rollingUpdate.maxSurge = 0

WARNING

Disk Preservation Model

Upgrades rely on Cluster API's rolling update mechanism. Each cluster has four disk classes; only the pool-managed class survives a delete-recreate.

Disk class	Declared in	Survives upgrade?	Use for
System disk (root volume)	The VM template image used for `vmTemplateName`	❌ Never	OS + kubelet/kubeadm/containerd. Rebuilt from the new template every replacement.
Template-local disks	`DCSMachineTemplate.spec.template.spec.vmConfig.dcsMachineDiskSpec`	❌ Never	Ephemeral cache. Destroyed with the old VM.
Pool-managed persistent disks	`DCSIpHostnamePool.spec.pool[].persistentDisk`	✅ Detached from old VM and reattached to the new VM at the same IP slot	Platform state such as `/var/cpaas`.
External CSI volumes (cinder, etc.)	Workload PVCs / CSI driver	✅ Unrelated to node lifecycle	Application data.

"Preserved" means the same disk identity is reattached — it does not mean the disk's contents are time-traveled. Anything written to a pool-managed disk during the upgrade window stays after the upgrade and stays after a rollback.

Pool-managed preservation requires one-by-one replacement, so keep maxSurge = 0 on both KubeadmControlPlane.spec.rolloutStrategy.rollingUpdate and MachineDeployment.spec.strategy.rollingUpdate.

If your existing cluster still keeps preserved data in the old template-disk layout, migrate it first by following Migrate Existing Huawei DCS Clusters to Pool-Managed Persistent Disks.

WARNING

Templates Cannot Be Modified In Place

DCSMachineTemplate is a Cluster API infrastructure template. Cluster API only triggers rolling replacement when KubeadmControlPlane.spec.machineTemplate.infrastructureRef.name or MachineDeployment.spec.template.spec.infrastructureRef.name points at a different template name. Editing the existing template in place changes the manifest but does not produce a new rollout — the running VMs continue to use the in-memory snapshot of the previous template.

Every upgrade step on this page therefore creates a new DCSMachineTemplate with a new metadata.name, applies it, and then patches the controlling resource's infrastructureRef.name to the new template. The previous template should be kept until the new rollout is healthy in case rollback is required.

Using YAML

YAML-based upgrades do not depend on Fleet Essentials.

Required Values From the OS Support Matrix

The authoritative mapping between an ACP release, its MicroOS image, the Kubernetes version, the matching CoreDNS, etcd, and Kube-OVN versions lives in OS Support Matrix. Locate the row that corresponds to the target ACP version before you start; the row supplies every value the YAML steps below need.

The cells you read from that row map to the upgrade manifests as follows:

OS Support Matrix column	Used to set	Where it lands
MicroOS Image Version	`DCSMachineTemplate.spec.template.spec.vmTemplateName` (new VM template the cloned nodes are built from)	Control plane and worker `DCSMachineTemplate`
Kubernetes Version	`KubeadmControlPlane.spec.version` and `MachineDeployment.spec.template.spec.version`	Both control plane and worker
coredns	`KubeadmControlPlane.spec.kubeadmConfigSpec.clusterConfiguration.dns.imageTag`	Control plane only
etcd	`KubeadmControlPlane.spec.kubeadmConfigSpec.clusterConfiguration.etcd.local.imageTag`	Control plane only
kube-ovn (chart)	`Cluster.metadata.annotations["cpaas.io/kube-ovn-version"]` and the `cni-kube-ovn` `AppRelease` `spec.source.charts[0].targetRevision` on the workload cluster	The annotation records the intended chart version; on an existing cluster the chart revision must be patched separately (see step 3 below). This is the `acp/chart-cpaas-kube-ovn` chart version (for example `v4.3.3`), not the Kube-OVN component version.

The CoreDNS and etcd image tags are control-plane-only because clusterConfiguration is a KubeadmControlPlane field. Worker nodes inherit container image versions from the new VM template; the MachineDeployment does not carry its own dns/etcd tags. The Kube-OVN annotation lives on the Cluster resource, not on KubeadmControlPlane, because the DCS provider watches it independently of the Kubernetes control plane rollout.

Confirm with the cluster's platform owner that the target MicroOS image has already been uploaded to the DCS platform under the same name as the MicroOS Image Version value in the matrix row. The upgrade fails if that VM template is not present on DCS when the DCSMachineTemplate is applied.

Upgrade Control Plane Infrastructure

Upgrading the control plane machine template lets you roll out updated VM specifications, system patches, and infrastructure settings.

Procedure

Create an updated machine template

Copy the existing DCSMachineTemplate referenced by KubeadmControlPlane and save it as a new file:
kubectl get dcsmachinetemplate <current-template-name> -n cpaas-system -o yaml > new-cp-template.yaml
Modify the template specifications

Update the new template as needed:
- Set metadata.name to <new-template-name>
- Update spec.template.spec.vmTemplateName
- Update spec.template.spec.vmConfig.dcsMachineCpuSpec.quantity
- Update spec.template.spec.vmConfig.dcsMachineMemorySpec.quantity
- Update spec.template.spec.vmConfig.dcsMachineDiskSpec for system and template-local disks only
- Strip server-generated metadata (resourceVersion, uid, generation, creationTimestamp, managedFields, kubectl.kubernetes.io/last-applied-configuration annotation) and the entire status field from the copied manifest.
- Leave spec.template.spec.providerID unset. The DCS provider sets providerID to dcs://<machine-name> once the VM is created; pre-filling it in the template breaks the controller's identity binding.
Keep pool-managed persistent disks, including /var/cpaas, in DCSIpHostnamePool.spec.pool[].persistentDisk.

Apply the updated template

kubectl apply -f new-cp-template.yaml -n cpaas-system

Update the control plane reference

Modify the KubeadmControlPlane resource to reference the new template:

kubectl patch kubeadmcontrolplane <kcp-name> -n cpaas-system --type='merge' -p='{"spec":{"machineTemplate":{"infrastructureRef":{"name":"<new-template-name>"}}}}'

Monitor the rolling update

kubectl get kubeadmcontrolplane <kcp-name> -n cpaas-system -w
kubectl get machines -n cpaas-system -l cluster.x-k8s.io/control-plane

Upgrade Control Plane Kubernetes Version

Upgrading the control plane Kubernetes version on immutable OS is a delete-recreate workflow. The control plane VMs are replaced one by one from a new DCSMachineTemplate that points at the target MicroOS VM template, and the KubeadmControlPlane resource is patched to carry the matching Kubernetes version, CoreDNS image tag, and etcd image tag.

Before you start, collect every required value from the target ACP row in the OS Support Matrix as described in Required Values From the OS Support Matrix.

Procedure

Create a new DCSMachineTemplate for the target Kubernetes version

Copy the existing control-plane template and update metadata.name to a new name and spec.template.spec.vmTemplateName to the MicroOS Image Version value read from the target row in the OS Support Matrix. Keep pool-managed persistent disks in DCSIpHostnamePool.spec.pool[].persistentDisk rather than reintroducing them as template disks.
kubectl get dcsmachinetemplate <current-cp-template-name> -n cpaas-system -o yaml > new-cp-template.yaml # edit metadata.name and spec.template.spec.vmTemplateName kubectl apply -f new-cp-template.yaml -n cpaas-system
Patch the KubeadmControlPlane with the target Kubernetes values

Update the KubeadmControlPlane resource in a single edit to keep spec.version, the CoreDNS image tag, the etcd image tag, and the infrastructure template reference consistent with the same MicroOS release:
- spec.version ← Kubernetes Version from the OS Support Matrix row
- spec.kubeadmConfigSpec.clusterConfiguration.dns.imageTag ← coredns column from the same row
- spec.kubeadmConfigSpec.clusterConfiguration.etcd.local.imageTag ← etcd column from the same row
- spec.machineTemplate.infrastructureRef.name ← the new DCSMachineTemplate name created in step 1
  kubectl edit kubeadmcontrolplane <kcp-name> -n cpaas-system
Updating only spec.version is not sufficient. The CoreDNS and etcd image tags must move together with the Kubernetes version because they are built from the same MicroOS release; leaving them at the previous values can result in CoreDNS and etcd pods that do not match the new Kubernetes minor version.

Upgrade the Kube-OVN chart on the workload cluster

Kube-OVN is a Core lifecycle component, but on immutable OS the DCS provider does not pin its chart version to the cluster's Kubernetes version. The chart version is carried by a separate AppRelease named cni-kube-ovn in the cpaas-system namespace of the workload cluster, and you move it forward in two steps: update the annotation on the Cluster resource for bookkeeping and future re-creation, then patch the existing AppRelease directly to bump the chart revision.

WARNING

Why two steps are required on DCS

The DCS provider creates the cni-kube-ovn AppRelease the first time the cluster is built, and from then on it reconciles only the spec.values block (cluster name, CIDRs, registry, control-plane node list). It does not write to spec.source.charts[0].targetRevision on an AppRelease that already exists. As a result, changing cpaas.io/kube-ovn-version on the Cluster resource alone does not move the chart version on the workload cluster. The annotation must still be updated so the recorded target matches the OS Support Matrix row, but the chart upgrade itself is driven by a direct AppRelease patch.

3.1. Update the cpaas.io/kube-ovn-version annotation on the Cluster resource

kubectl annotate cluster <cluster-name> -n cpaas-system \
  cpaas.io/kube-ovn-version=<kube-ovn-version-from-matrix> --overwrite

The annotation does not update automatically when spec.version changes; keep it in step with the kube-ovn (chart) column of the target row.

3.2. Patch the AppRelease chart revision on the workload cluster

Run the patch against the workload cluster's API server (not the bootstrap KIND or the global cluster):

kubectl patch apprelease cni-kube-ovn -n cpaas-system --type='json' \
  -p='[{"op":"replace","path":"/spec/source/charts/0/targetRevision","value":"<kube-ovn-version-from-matrix>"}]'

Use the same value you set in the annotation. The releaseName (cpaas-kube-ovn) and name (acp/chart-cpaas-kube-ovn) are managed by the provider; do not change them.

3.3. Wait for reconciliation to complete

Watch the chart phase and the installed revision:

# Overall AppRelease state — Sync and Health columns must reach a Success-equivalent reason
kubectl get apprelease cni-kube-ovn -n cpaas-system

# Installed revision and chart phase
kubectl get apprelease cni-kube-ovn -n cpaas-system \
  -o jsonpath='Installed: {.status.charts.*.installedRevision}{"\n"}Phase: {.status.charts.*.phase}{"\n"}'

The normal sequence is Upgrading → HealthChecking → Success. On small clusters the full transition typically completes within about one minute. Read the phases as follows:

Phase	Meaning	`installedRevision`
`Upgrading`	Helm release upgrade in progress. `Sync` condition is `Unknown(Syncing)`.	Still the previous version
`HealthChecking`	Helm release applied; controller is verifying Kube-OVN pods. `Sync` condition is `True(Synced)`.	Already the target version
`Success`	All three conditions (`Validate`, `Sync`, `Health`) are `True`.	Target version

WARNING

Do not declare the upgrade complete on installedRevision alone. The field flips to the target value during HealthChecking, before pods have been verified Ready. The chart is only considered upgraded when phase is Success and installedRevision matches the target.

The AppRelease API also defines Downloading, Installing, Syncing, DownloadFailed, DeployFailed, and NotReady. The first three are transient and the upgrade should converge on its own. The last three indicate a failure that needs manual investigation; start with kubectl describe apprelease cni-kube-ovn -n cpaas-system to read the per-condition message field.

Monitor the rolling update
kubectl get kubeadmcontrolplane <kcp-name> -n cpaas-system -w kubectl get machines -n cpaas-system -l cluster.x-k8s.io/control-plane kubectl get nodes
KubeadmControlPlane.spec.rolloutStrategy.rollingUpdate.maxSurge must remain 0 when the cluster relies on pool-managed persistent disks, so the control plane VMs are replaced one at a time.

Upgrade Worker Nodes

Worker node Kubernetes upgrades are managed through MachineDeployment resources. Worker upgrades carry fewer fields than the control plane: the CoreDNS and etcd image tags are part of KubeadmControlPlane.spec.kubeadmConfigSpec.clusterConfiguration, which MachineDeployment does not have. Worker nodes inherit Kubernetes component versions from the new VM template; the MachineDeployment only needs the target Kubernetes version and the new template reference.

Before you start, read the MicroOS Image Version and Kubernetes Version cells from the target ACP row in the OS Support Matrix as described in Required Values From the OS Support Matrix.

Procedure

Create a new DCSMachineTemplate for worker nodes
- Create a new DCSMachineTemplate with a vmTemplateName set to the MicroOS Image Version value from the target row in the OS Support Matrix
- Keep /var/cpaas and any other upgrade-preserved disks in DCSIpHostnamePool.spec.pool[].persistentDisk rather than reintroducing them as template disks
Update the MachineDeployment
- Set spec.template.spec.version to the Kubernetes Version value from the same OS Support Matrix row
- Set spec.template.spec.infrastructureRef.name to the new DCSMachineTemplate name created in step 1
- Optionally update spec.template.spec.bootstrap.configRef.name if bootstrap configuration changes are required for this release
Monitor the rolling update
- Verify that the rolling update completes successfully
- Verify that the new worker nodes join the cluster with the target Kubernetes version
MachineDeployment.spec.strategy.rollingUpdate.maxSurge must remain 0 when the cluster relies on pool-managed persistent disks, so the worker nodes are replaced one at a time.

Rolling Back a Failed Upgrade

If the rolling update fails — new VMs fail to boot, nodes do not become Ready, or the new Kubernetes minor version surfaces an incompatibility — revert the template reference and Kubernetes-version fields back to the previous values. Cluster API treats the reversion as a new spec drift and rolls the v2 machines back to the previous template, one at a time.

Three facts to internalize before rolling back:

The old VMs are gone. They were destroyed during the upgrade. Rollback uses the old template to build a fresh set of replacement machines; it does not restore the original VMs.
The old DCSMachineTemplate resource must still exist. Do not delete the previous template until the new rollout is healthy. If you already deleted it, recreate it from version control or backup before rolling back.
Pool-managed disk identity is preserved, but data state is not. Disks declared in DCSIpHostnamePool.spec.pool[].persistentDisk reattach to the rolled-back machines at the same IP slot, but any data written to those disks during the upgrade window (for example, etcd entries in the new Kubernetes minor format) stays. If the new format is unreadable by the older Kubernetes minor version, the rollback may still fail and require manual etcd restoration.

Procedure:

Control plane: patch KubeadmControlPlane to restore the previous spec.machineTemplate.infrastructureRef.name, spec.version, spec.kubeadmConfigSpec.clusterConfiguration.dns.imageTag, and spec.kubeadmConfigSpec.clusterConfiguration.etcd.local.imageTag.
Workers: patch each MachineDeployment to restore the previous spec.template.spec.infrastructureRef.name and spec.template.spec.version.

Kube-OVN: if the Kube-OVN chart was upgraded, revert it the same way the upgrade was applied — first restore the annotation, then patch the AppRelease chart revision back. Verify with the same installedRevision + phase=Success check used in step 3.

kubectl annotate cluster <cluster-name> -n cpaas-system \
  cpaas.io/kube-ovn-version=<previous-kube-ovn-version> --overwrite

kubectl patch apprelease cni-kube-ovn -n cpaas-system --type='json' \
  -p='[{"op":"replace","path":"/spec/source/charts/0/targetRevision","value":"<previous-kube-ovn-version>"}]'

If the new control plane never reached etcd quorum, the KubeadmControlPlane controller may refuse to roll back any machine because its preflight checks block on an unhealthy etcd. Recover etcd quorum first (operator intervention) before retrying the rollback.

Using the Web UI

WARNING

Fleet Essentials UI does not support ACP 4.3 cluster upgrades

The Fleet Essentials UI workflow has not been adapted to the Cluster Version Operator (CVO) mechanism introduced in ACP 4.3. Do not use the Fleet Essentials UI to upgrade DCS clusters on ACP 4.3.

Two supported alternatives:

YAML path — Follow the YAML-based upgrade procedure documented earlier on this page.
ACP Core cluster management UI — Use the two-step upgrade flow built into the ACP Core platform; see Request the upgrade for the global cluster or Request the upgrade for workload clusters.

Cluster creation and node-pool management through the Fleet Essentials UI are unaffected by this limitation.

Use this workflow to upgrade Kubernetes from the web UI after Phase 1 is complete.

Version requirement: This workflow requires Fleet Essentials and Alauda Container Platform DCS Infrastructure Provider 1.0.13 or later. If the provider version is earlier than 1.0.13, use YAML manifests. If the upgrade relies on pool-managed persistent disks, use DCS provider v1.0.16 or later. In v1.0.16, the persistentDisk declaration on DCSIpHostnamePool remains YAML-only and is not exposed in the web UI.

Prerequisites

The Distribution Version upgrade is complete. See Upgrading Distribution Version
The Control Plane Node Pool is in Running state
The IP Pool has sufficient capacity for rolling updates
If the upgrade relies on pool-managed persistent disks, ensure the required DCSIpHostnamePool.spec.pool[].persistentDisk entries have already been created or updated through YAML

Upgrade Workflow

Kubernetes upgrades follow this sequence after the Distribution Version upgrade:

Upgrade the Control Plane Node Pool.
Wait for the Control Plane Node Pool upgrade to complete.
Upgrade Worker Node Pools in any order.

Checking Available Upgrades

Navigation: Clusters → Clusters → Select cluster → Node Pools Tab

Node Pools with available upgrades show Upgrade available indicator. Click on the Node Pool card to view Current vs Target versions.

Upgrade the Control Plane Node Pool

Steps:

In the Node Pools Tab, locate the Control Plane Node Pool
Click Upgrade
Review upgrade information:
- Current Version: Current Kubernetes version
- Target Version: Latest minor version supported (automatically selected)
Click Confirm to start

Monitoring:

Watch the Node Pool status
Nodes will roll update one by one (maxSurge=0). This one-by-one replacement is also required when the cluster relies on persistent disks.
Upgrade time depends on node count and resources

Upgrade Worker Node Pools

WARNING

Worker Node Pools cannot be upgraded until the Control Plane Node Pool upgrade completes.

When Worker Pool Upgrade is Available:

Control Plane Kubernetes Version matches global cluster version
Control Plane is in Running state

Upgrade Steps:

For each Worker Node Pool:
- Click Upgrade button
- Review and confirm
Pools can be upgraded in parallel after Control Plane completes

Upgrade Constraints:

Pool State	Upgrade Button
Not Running	❌ Disabled: "Upgrade is unavailable when the Worker Node Pool is not in the Running state"
Control Plane not started	❌ Disabled: "Upgrade the Control Plane Node Pool first"
Control Plane upgrading	❌ Disabled: "Wait for the Control Plane Node Pool upgrade to complete"
Control Plane upgraded	✅ Enabled

Cross-Version Upgrades

When upgrading across multiple minor versions (for example, v1.32 → v1.34):

Upgrade Control Plane to v1.33
Wait for completion
Upgrade Control Plane to v1.34
Repeat for Worker Pools

Why: Kubernetes only supports single minor version upgrades.

Troubleshooting

Issue	Solution
Upgrade button disabled	Check pool status and Control Plane version
Upgrade stuck	Check IP Pool availability, DCS platform resources
Nodes not joining	Verify network connectivity, DNS settings
Preserved disk did not reattach	Verify the disk is declared in `DCSIpHostnamePool.spec.pool[].persistentDisk`, check `status.persistentDiskStatus`, and confirm the cluster has already been migrated to the pool-managed layout

Upgrading Kubernetes on Huawei DCS

TOC

Upgrade Sequence

Prerequisites

Using YAML

Required Values From the OS Support Matrix

Upgrade Control Plane Infrastructure

Procedure

Upgrade Control Plane Kubernetes Version

Procedure

Upgrade Worker Nodes

Procedure

Rolling Back a Failed Upgrade

Using the Web UI

Prerequisites

Upgrade Workflow

Checking Available Upgrades

Upgrade the Control Plane Node Pool

Upgrade Worker Node Pools

Cross-Version Upgrades

Troubleshooting

Additional Resources

#Upgrading Kubernetes on Huawei DCS

#TOC

#Upgrade Sequence

#Prerequisites

#Using YAML

#Required Values From the OS Support Matrix

#Upgrade Control Plane Infrastructure

#Procedure

#Upgrade Control Plane Kubernetes Version

#Procedure

#Upgrade Worker Nodes

#Procedure

#Rolling Back a Failed Upgrade

#Using the Web UI

#Prerequisites

#Upgrade Workflow

#Checking Available Upgrades

#Upgrade the Control Plane Node Pool

#Upgrade Worker Node Pools

#Cross-Version Upgrades

#Troubleshooting

#Additional Resources

Upgrading Kubernetes on Huawei DCS

TOC

Upgrade Sequence

Prerequisites

Using YAML

Required Values From the OS Support Matrix

Upgrade Control Plane Infrastructure

Procedure

Upgrade Control Plane Kubernetes Version

Procedure

Upgrade Worker Nodes

Procedure

Rolling Back a Failed Upgrade

Using the Web UI

Prerequisites

Upgrade Workflow

Checking Available Upgrades

Upgrade the Control Plane Node Pool

Upgrade Worker Node Pools

Cross-Version Upgrades

Troubleshooting

Additional Resources