Upgrading Kubernetes on Huawei DCS

This guide explains how to complete Phase 2 of the upgrade workflow for clusters on Huawei DCS. Before you upgrade Kubernetes, complete the Distribution Version upgrade described in Upgrading Clusters.

INFO

Where this page fits in the full ACP upgrade flow

This page covers only the Kubernetes step of the upgrade. The full ACP upgrade flow — including upgrade artifact synchronization, ACP Core upgrade through CVO, Aligned plugin upgrades, and Agnostic plugin upgrades from Marketplace — is documented in the ACP product documentation. Complete those steps before you start the Kubernetes step on this page:

Use this page when the same cluster runs on an immutable operating system, because the Kubernetes step on immutable OS replaces nodes from a new MicroOS-based VM template rather than upgrading binaries in place.

INFO

Version

DCS provider v1.0.16 is the first release that supports pool-managed persistent disks.

INFO

Existing Cluster Migration

If your cluster runs ACP v4.2.1 or later and you are moving to DCS provider v1.0.16 or later, complete the migration procedure in Migrate Existing Huawei DCS Clusters to Pool-Managed Persistent Disks before you rely on upgrade-time disk preservation.

Upgrade Sequence

Upgrade DCS clusters in the following order:

  1. (Prerequisite) Upgrade the ACP platform on the management cluster first. This brings the cluster-api-provider-dcs controller and the related CAPI components (core, KubeadmControlPlane provider, bootstrap provider) to versions that understand the new schema. Trigger workload-cluster upgrades only after the management-side controllers have rolled out and become Ready.
  2. Upgrade the Distribution Version (Aligned Extensions) on the workload cluster. See Upgrading Distribution Version.
  3. Upgrade the control plane Kubernetes version.
  4. Upgrade worker nodes to the target Kubernetes version.

Cluster API orchestrates rolling updates with built-in safety mechanisms to reduce service disruption.

WARNING

Skipping step 1 risks two failure modes: the old controller silently ignores new schema fields written to DCSIpHostnamePool / DCSMachineTemplate; or a controller image swap mid-rollout interrupts persistent-disk state-machine progression. Always settle the management-side upgrade before touching workload rollout.

Prerequisites

Before you start, ensure all of the following prerequisites are met:

  • The Distribution Version upgrade is complete
  • The control plane is reachable
  • All nodes are healthy and in Ready state
  • The IP Pool has sufficient capacity for rolling updates
  • The VM template supports the target Kubernetes version. See OS Support Matrix for version mapping
  • The target Kubernetes version is compatible with your workloads and add-ons
  • DCS VM templates are 4.2.1 or later if you use pool-managed persistent disks, because safe shutdown and disk detach depend on guest tools
  • If you rely on pool-managed persistent disks, keep KubeadmControlPlane.spec.rolloutStrategy.rollingUpdate.maxSurge = 0 and each MachineDeployment.spec.strategy.rollingUpdate.maxSurge = 0
WARNING

Disk Preservation Model

Upgrades rely on Cluster API's rolling update mechanism. Each cluster has four disk classes; only the pool-managed class survives a delete-recreate.

Disk classDeclared inSurvives upgrade?Use for
System disk (root volume)The VM template image used for vmTemplateName❌ NeverOS + kubelet/kubeadm/containerd. Rebuilt from the new template every replacement.
Template-local disksDCSMachineTemplate.spec.template.spec.vmConfig.dcsMachineDiskSpec❌ NeverEphemeral cache. Destroyed with the old VM.
Pool-managed persistent disksDCSIpHostnamePool.spec.pool[].persistentDisk✅ Detached from old VM and reattached to the new VM at the same IP slotPlatform state such as /var/cpaas.
External CSI volumes (cinder, etc.)Workload PVCs / CSI driver✅ Unrelated to node lifecycleApplication data.

"Preserved" means the same disk identity is reattached — it does not mean the disk's contents are time-traveled. Anything written to a pool-managed disk during the upgrade window stays after the upgrade and stays after a rollback.

Pool-managed preservation requires one-by-one replacement, so keep maxSurge = 0 on both KubeadmControlPlane.spec.rolloutStrategy.rollingUpdate and MachineDeployment.spec.strategy.rollingUpdate.

If your existing cluster still keeps preserved data in the old template-disk layout, migrate it first by following Migrate Existing Huawei DCS Clusters to Pool-Managed Persistent Disks.

WARNING

Templates Cannot Be Modified In Place

DCSMachineTemplate is a Cluster API infrastructure template. Cluster API only triggers rolling replacement when KubeadmControlPlane.spec.machineTemplate.infrastructureRef.name or MachineDeployment.spec.template.spec.infrastructureRef.name points at a different template name. Editing the existing template in place changes the manifest but does not produce a new rollout — the running VMs continue to use the in-memory snapshot of the previous template.

Every upgrade step on this page therefore creates a new DCSMachineTemplate with a new metadata.name, applies it, and then patches the controlling resource's infrastructureRef.name to the new template. The previous template should be kept until the new rollout is healthy in case rollback is required.

Using YAML

YAML-based upgrades do not depend on Fleet Essentials.

Required Values From the OS Support Matrix

The authoritative mapping between an ACP release, its MicroOS image, the Kubernetes version, the matching CoreDNS, etcd, and Kube-OVN versions lives in OS Support Matrix. Locate the row that corresponds to the target ACP version before you start; the row supplies every value the YAML steps below need.

The cells you read from that row map to the upgrade manifests as follows:

OS Support Matrix columnUsed to setWhere it lands
MicroOS Image VersionDCSMachineTemplate.spec.template.spec.vmTemplateName (new VM template the cloned nodes are built from)Control plane and worker DCSMachineTemplate
Kubernetes VersionKubeadmControlPlane.spec.version and MachineDeployment.spec.template.spec.versionBoth control plane and worker
corednsKubeadmControlPlane.spec.kubeadmConfigSpec.clusterConfiguration.dns.imageTagControl plane only
etcdKubeadmControlPlane.spec.kubeadmConfigSpec.clusterConfiguration.etcd.local.imageTagControl plane only
kube-ovn (chart)Cluster.metadata.annotations["cpaas.io/kube-ovn-version"] and the cni-kube-ovn AppRelease spec.source.charts[0].targetRevision on the workload clusterThe annotation records the intended chart version; on an existing cluster the chart revision must be patched separately (see step 3 below). This is the acp/chart-cpaas-kube-ovn chart version (for example v4.3.3), not the Kube-OVN component version.

The CoreDNS and etcd image tags are control-plane-only because clusterConfiguration is a KubeadmControlPlane field. Worker nodes inherit container image versions from the new VM template; the MachineDeployment does not carry its own dns/etcd tags. The Kube-OVN annotation lives on the Cluster resource, not on KubeadmControlPlane, because the DCS provider watches it independently of the Kubernetes control plane rollout.

Confirm with the cluster's platform owner that the target MicroOS image has already been uploaded to the DCS platform under the same name as the MicroOS Image Version value in the matrix row. The upgrade fails if that VM template is not present on DCS when the DCSMachineTemplate is applied.

Upgrade Control Plane Infrastructure

Upgrading the control plane machine template lets you roll out updated VM specifications, system patches, and infrastructure settings.

Procedure

  1. Create an updated machine template

    Copy the existing DCSMachineTemplate referenced by KubeadmControlPlane and save it as a new file:

    kubectl get dcsmachinetemplate <current-template-name> -n cpaas-system -o yaml > new-cp-template.yaml
  2. Modify the template specifications

    Update the new template as needed:

    • Set metadata.name to <new-template-name>
    • Update spec.template.spec.vmTemplateName
    • Update spec.template.spec.vmConfig.dcsMachineCpuSpec.quantity
    • Update spec.template.spec.vmConfig.dcsMachineMemorySpec.quantity
    • Update spec.template.spec.vmConfig.dcsMachineDiskSpec for system and template-local disks only
    • Strip server-generated metadata (resourceVersion, uid, generation, creationTimestamp, managedFields, kubectl.kubernetes.io/last-applied-configuration annotation) and the entire status field from the copied manifest.
    • Leave spec.template.spec.providerID unset. The DCS provider sets providerID to dcs://<machine-name> once the VM is created; pre-filling it in the template breaks the controller's identity binding.

    Keep pool-managed persistent disks, including /var/cpaas, in DCSIpHostnamePool.spec.pool[].persistentDisk.

  3. Apply the updated template

    kubectl apply -f new-cp-template.yaml -n cpaas-system
  4. Update the control plane reference

    Modify the KubeadmControlPlane resource to reference the new template:

    kubectl patch kubeadmcontrolplane <kcp-name> -n cpaas-system --type='merge' -p='{"spec":{"machineTemplate":{"infrastructureRef":{"name":"<new-template-name>"}}}}'
  5. Monitor the rolling update

    kubectl get kubeadmcontrolplane <kcp-name> -n cpaas-system -w
    kubectl get machines -n cpaas-system -l cluster.x-k8s.io/control-plane

Upgrade Control Plane Kubernetes Version

Upgrading the control plane Kubernetes version on immutable OS is a delete-recreate workflow. The control plane VMs are replaced one by one from a new DCSMachineTemplate that points at the target MicroOS VM template, and the KubeadmControlPlane resource is patched to carry the matching Kubernetes version, CoreDNS image tag, and etcd image tag.

Before you start, collect every required value from the target ACP row in the OS Support Matrix as described in Required Values From the OS Support Matrix.

Procedure

  1. Create a new DCSMachineTemplate for the target Kubernetes version

    Copy the existing control-plane template and update metadata.name to a new name and spec.template.spec.vmTemplateName to the MicroOS Image Version value read from the target row in the OS Support Matrix. Keep pool-managed persistent disks in DCSIpHostnamePool.spec.pool[].persistentDisk rather than reintroducing them as template disks.

    kubectl get dcsmachinetemplate <current-cp-template-name> -n cpaas-system -o yaml > new-cp-template.yaml
    # edit metadata.name and spec.template.spec.vmTemplateName
    kubectl apply -f new-cp-template.yaml -n cpaas-system
  2. Patch the KubeadmControlPlane with the target Kubernetes values

    Update the KubeadmControlPlane resource in a single edit to keep spec.version, the CoreDNS image tag, the etcd image tag, and the infrastructure template reference consistent with the same MicroOS release:

    • spec.versionKubernetes Version from the OS Support Matrix row

    • spec.kubeadmConfigSpec.clusterConfiguration.dns.imageTagcoredns column from the same row

    • spec.kubeadmConfigSpec.clusterConfiguration.etcd.local.imageTagetcd column from the same row

    • spec.machineTemplate.infrastructureRef.name ← the new DCSMachineTemplate name created in step 1

      kubectl edit kubeadmcontrolplane <kcp-name> -n cpaas-system

    Updating only spec.version is not sufficient. The CoreDNS and etcd image tags must move together with the Kubernetes version because they are built from the same MicroOS release; leaving them at the previous values can result in CoreDNS and etcd pods that do not match the new Kubernetes minor version.

  3. Upgrade the Kube-OVN chart on the workload cluster

    Kube-OVN is a Core lifecycle component, but on immutable OS the DCS provider does not pin its chart version to the cluster's Kubernetes version. The chart version is carried by a separate AppRelease named cni-kube-ovn in the cpaas-system namespace of the workload cluster, and you move it forward in two steps: update the annotation on the Cluster resource for bookkeeping and future re-creation, then patch the existing AppRelease directly to bump the chart revision.

    WARNING

    Why two steps are required on DCS

    The DCS provider creates the cni-kube-ovn AppRelease the first time the cluster is built, and from then on it reconciles only the spec.values block (cluster name, CIDRs, registry, control-plane node list). It does not write to spec.source.charts[0].targetRevision on an AppRelease that already exists. As a result, changing cpaas.io/kube-ovn-version on the Cluster resource alone does not move the chart version on the workload cluster. The annotation must still be updated so the recorded target matches the OS Support Matrix row, but the chart upgrade itself is driven by a direct AppRelease patch.

    3.1. Update the cpaas.io/kube-ovn-version annotation on the Cluster resource

    kubectl annotate cluster <cluster-name> -n cpaas-system \
      cpaas.io/kube-ovn-version=<kube-ovn-version-from-matrix> --overwrite

    The annotation does not update automatically when spec.version changes; keep it in step with the kube-ovn (chart) column of the target row.

    3.2. Patch the AppRelease chart revision on the workload cluster

    Run the patch against the workload cluster's API server (not the bootstrap KIND or the global cluster):

    kubectl patch apprelease cni-kube-ovn -n cpaas-system --type='json' \
      -p='[{"op":"replace","path":"/spec/source/charts/0/targetRevision","value":"<kube-ovn-version-from-matrix>"}]'

    Use the same value you set in the annotation. The releaseName (cpaas-kube-ovn) and name (acp/chart-cpaas-kube-ovn) are managed by the provider; do not change them.

    3.3. Wait for reconciliation to complete

    Watch the chart phase and the installed revision:

    # Overall AppRelease state — Sync and Health columns must reach a Success-equivalent reason
    kubectl get apprelease cni-kube-ovn -n cpaas-system
    
    # Installed revision and chart phase
    kubectl get apprelease cni-kube-ovn -n cpaas-system \
      -o jsonpath='Installed: {.status.charts.*.installedRevision}{"\n"}Phase: {.status.charts.*.phase}{"\n"}'

    The normal sequence is Upgrading → HealthChecking → Success. On small clusters the full transition typically completes within about one minute. Read the phases as follows:

    PhaseMeaninginstalledRevision
    UpgradingHelm release upgrade in progress. Sync condition is Unknown(Syncing).Still the previous version
    HealthCheckingHelm release applied; controller is verifying Kube-OVN pods. Sync condition is True(Synced).Already the target version
    SuccessAll three conditions (Validate, Sync, Health) are True.Target version
    WARNING

    Do not declare the upgrade complete on installedRevision alone. The field flips to the target value during HealthChecking, before pods have been verified Ready. The chart is only considered upgraded when phase is Success and installedRevision matches the target.

    The AppRelease API also defines Downloading, Installing, Syncing, DownloadFailed, DeployFailed, and NotReady. The first three are transient and the upgrade should converge on its own. The last three indicate a failure that needs manual investigation; start with kubectl describe apprelease cni-kube-ovn -n cpaas-system to read the per-condition message field.

  4. Monitor the rolling update

    kubectl get kubeadmcontrolplane <kcp-name> -n cpaas-system -w
    kubectl get machines -n cpaas-system -l cluster.x-k8s.io/control-plane
    kubectl get nodes

    KubeadmControlPlane.spec.rolloutStrategy.rollingUpdate.maxSurge must remain 0 when the cluster relies on pool-managed persistent disks, so the control plane VMs are replaced one at a time.

Upgrade Worker Nodes

Worker node Kubernetes upgrades are managed through MachineDeployment resources. Worker upgrades carry fewer fields than the control plane: the CoreDNS and etcd image tags are part of KubeadmControlPlane.spec.kubeadmConfigSpec.clusterConfiguration, which MachineDeployment does not have. Worker nodes inherit Kubernetes component versions from the new VM template; the MachineDeployment only needs the target Kubernetes version and the new template reference.

Before you start, read the MicroOS Image Version and Kubernetes Version cells from the target ACP row in the OS Support Matrix as described in Required Values From the OS Support Matrix.

Procedure

  1. Create a new DCSMachineTemplate for worker nodes

    • Create a new DCSMachineTemplate with a vmTemplateName set to the MicroOS Image Version value from the target row in the OS Support Matrix
    • Keep /var/cpaas and any other upgrade-preserved disks in DCSIpHostnamePool.spec.pool[].persistentDisk rather than reintroducing them as template disks
  2. Update the MachineDeployment

    • Set spec.template.spec.version to the Kubernetes Version value from the same OS Support Matrix row
    • Set spec.template.spec.infrastructureRef.name to the new DCSMachineTemplate name created in step 1
    • Optionally update spec.template.spec.bootstrap.configRef.name if bootstrap configuration changes are required for this release
  3. Monitor the rolling update

    • Verify that the rolling update completes successfully
    • Verify that the new worker nodes join the cluster with the target Kubernetes version

    MachineDeployment.spec.strategy.rollingUpdate.maxSurge must remain 0 when the cluster relies on pool-managed persistent disks, so the worker nodes are replaced one at a time.

Rolling Back a Failed Upgrade

If the rolling update fails — new VMs fail to boot, nodes do not become Ready, or the new Kubernetes minor version surfaces an incompatibility — revert the template reference and Kubernetes-version fields back to the previous values. Cluster API treats the reversion as a new spec drift and rolls the v2 machines back to the previous template, one at a time.

Three facts to internalize before rolling back:

  • The old VMs are gone. They were destroyed during the upgrade. Rollback uses the old template to build a fresh set of replacement machines; it does not restore the original VMs.
  • The old DCSMachineTemplate resource must still exist. Do not delete the previous template until the new rollout is healthy. If you already deleted it, recreate it from version control or backup before rolling back.
  • Pool-managed disk identity is preserved, but data state is not. Disks declared in DCSIpHostnamePool.spec.pool[].persistentDisk reattach to the rolled-back machines at the same IP slot, but any data written to those disks during the upgrade window (for example, etcd entries in the new Kubernetes minor format) stays. If the new format is unreadable by the older Kubernetes minor version, the rollback may still fail and require manual etcd restoration.

Procedure:

  • Control plane: patch KubeadmControlPlane to restore the previous spec.machineTemplate.infrastructureRef.name, spec.version, spec.kubeadmConfigSpec.clusterConfiguration.dns.imageTag, and spec.kubeadmConfigSpec.clusterConfiguration.etcd.local.imageTag.

  • Workers: patch each MachineDeployment to restore the previous spec.template.spec.infrastructureRef.name and spec.template.spec.version.

  • Kube-OVN: if the Kube-OVN chart was upgraded, revert it the same way the upgrade was applied — first restore the annotation, then patch the AppRelease chart revision back. Verify with the same installedRevision + phase=Success check used in step 3.

    kubectl annotate cluster <cluster-name> -n cpaas-system \
      cpaas.io/kube-ovn-version=<previous-kube-ovn-version> --overwrite
    
    kubectl patch apprelease cni-kube-ovn -n cpaas-system --type='json' \
      -p='[{"op":"replace","path":"/spec/source/charts/0/targetRevision","value":"<previous-kube-ovn-version>"}]'

If the new control plane never reached etcd quorum, the KubeadmControlPlane controller may refuse to roll back any machine because its preflight checks block on an unhealthy etcd. Recover etcd quorum first (operator intervention) before retrying the rollback.


Using the Web UI

WARNING

Fleet Essentials UI does not support ACP 4.3 cluster upgrades

The Fleet Essentials UI workflow has not been adapted to the Cluster Version Operator (CVO) mechanism introduced in ACP 4.3. Do not use the Fleet Essentials UI to upgrade DCS clusters on ACP 4.3.

Two supported alternatives:

Cluster creation and node-pool management through the Fleet Essentials UI are unaffected by this limitation.

Use this workflow to upgrade Kubernetes from the web UI after Phase 1 is complete.

Version requirement: This workflow requires Fleet Essentials and Alauda Container Platform DCS Infrastructure Provider 1.0.13 or later. If the provider version is earlier than 1.0.13, use YAML manifests. If the upgrade relies on pool-managed persistent disks, use DCS provider v1.0.16 or later. In v1.0.16, the persistentDisk declaration on DCSIpHostnamePool remains YAML-only and is not exposed in the web UI.

Prerequisites

  • The Distribution Version upgrade is complete. See Upgrading Distribution Version
  • The Control Plane Node Pool is in Running state
  • The IP Pool has sufficient capacity for rolling updates
  • If the upgrade relies on pool-managed persistent disks, ensure the required DCSIpHostnamePool.spec.pool[].persistentDisk entries have already been created or updated through YAML

Upgrade Workflow

Kubernetes upgrades follow this sequence after the Distribution Version upgrade:

  1. Upgrade the Control Plane Node Pool.
  2. Wait for the Control Plane Node Pool upgrade to complete.
  3. Upgrade Worker Node Pools in any order.

Checking Available Upgrades

Navigation: Clusters → Clusters → Select cluster → Node Pools Tab

Node Pools with available upgrades show Upgrade available indicator. Click on the Node Pool card to view Current vs Target versions.

Upgrade the Control Plane Node Pool

Steps:

  1. In the Node Pools Tab, locate the Control Plane Node Pool
  2. Click Upgrade
  3. Review upgrade information:
    • Current Version: Current Kubernetes version
    • Target Version: Latest minor version supported (automatically selected)
  4. Click Confirm to start

Monitoring:

  • Watch the Node Pool status
  • Nodes will roll update one by one (maxSurge=0). This one-by-one replacement is also required when the cluster relies on persistent disks.
  • Upgrade time depends on node count and resources

Upgrade Worker Node Pools

WARNING

Worker Node Pools cannot be upgraded until the Control Plane Node Pool upgrade completes.

When Worker Pool Upgrade is Available:

  • Control Plane Kubernetes Version matches global cluster version
  • Control Plane is in Running state

Upgrade Steps:

  1. For each Worker Node Pool:
    • Click Upgrade button
    • Review and confirm
  2. Pools can be upgraded in parallel after Control Plane completes

Upgrade Constraints:

Pool StateUpgrade Button
Not Running❌ Disabled: "Upgrade is unavailable when the Worker Node Pool is not in the Running state"
Control Plane not started❌ Disabled: "Upgrade the Control Plane Node Pool first"
Control Plane upgrading❌ Disabled: "Wait for the Control Plane Node Pool upgrade to complete"
Control Plane upgraded✅ Enabled

Cross-Version Upgrades

When upgrading across multiple minor versions (for example, v1.32 → v1.34):

  1. Upgrade Control Plane to v1.33
  2. Wait for completion
  3. Upgrade Control Plane to v1.34
  4. Repeat for Worker Pools

Why: Kubernetes only supports single minor version upgrades.

Troubleshooting

IssueSolution
Upgrade button disabledCheck pool status and Control Plane version
Upgrade stuckCheck IP Pool availability, DCS platform resources
Nodes not joiningVerify network connectivity, DNS settings
Preserved disk did not reattachVerify the disk is declared in DCSIpHostnamePool.spec.pool[].persistentDisk, check status.persistentDiskStatus, and confirm the cluster has already been migrated to the pool-managed layout

Additional Resources