Upgrading Kubernetes on Bare Metal
This guide explains how to complete Phase 2 of the upgrade workflow for clusters on bare metal. Before you upgrade Kubernetes, complete the Distribution Version upgrade described in Upgrading Clusters.
Where this page fits in the full ACP upgrade flow
This page covers only the Kubernetes step of the upgrade. The full ACP upgrade flow — including upgrade artifact synchronization, ACP Core upgrade through CVO, Aligned plugin upgrades, and Agnostic plugin upgrades from Marketplace — is documented in the ACP product documentation. Complete those steps before you start the Kubernetes step on this page:
- Upgrade Overview (scope and sequencing)
- Pre-Upgrade Preparation
- Upgrade the global cluster (Core, Aligned, Agnostic)
- Upgrade workload clusters (Core, Aligned, Agnostic)
Use this page when the same cluster runs on a physical-host immutable operating system, because the Kubernetes step on bare-metal replaces every node from a new elemental upgrade image rather than upgrading binaries in place.
TOC
Key ConsiderationsUpgrade SequencePrerequisitesUpdate the Image CatalogUpgrade the Control PlaneUpgrade WorkersCross-Version UpgradesVerificationTroubleshootingAdditional ResourcesKey Considerations
Bare-metal upgrades replace every node — they do not run kubeadm upgrade on the existing OS. The mechanism is:
- CAPI deletes one
Machineaccording to the rollout strategy. - The provider writes a
cleanplan to the inventory's plan secret. - The host stops kubelet + CRI workload + containerd, then returns to
Availablein the same pool. - CAPI creates a replacement
Machine. - The provider picks an
Availableinventory from the same pool, resolves the new Kubernetes version againstelemental-image-catalog, and writes areprovisionplan. - The host runs
cloud-init clean→elemental upgrade --system <new-image>→ reboot.initramfsclears Kubernetes persistent state; cloud-init re-executes and runskubeadm init/kubeadm joinagainst the new control plane.
Two structural consequences operators must internalize before starting:
- Bare-metal does not preserve Kubernetes-managed disk state.
/var/lib/kubelet,/var/lib/containerd,/var/lib/etcd,/etc/kubernetesare cleared by the initramfs cleanup step of every reprovision. There is no equivalent of the DCS provider's pool-managed persistent disks today. - The same
MachineInventorymay not be re-picked. The provider does not guarantee that the inventory released by acleanplan is the same one re-allocated to the replacementMachine. Replacements are performed as delete-then-add whenmaxSurge=0(no overlap between old and new node), so pool capacity sizing need only accommodate the desired replica count — old and new nodes do not coexist during the rollout.
Upgrade Sequence
Upgrade bare-metal clusters in the following order:
- (Prerequisite) Upgrade the ACP platform on the
globalcluster. This brings the bare-metal provider,elemental-operator, and the related CAPI components to versions that understand the new schema. Trigger workload-cluster upgrades only after the management-side controllers have rolled out and become Ready. - Upgrade the Distribution Version (Aligned Extensions) on the workload cluster. See Upgrading Distribution Version.
- Ensure the target Kubernetes version is present in
elemental-image-catalog(and the matching-isorepository is available for any future host re-registration). - Upgrade the control plane Kubernetes version (replaces all control-plane nodes one at a time).
- Upgrade worker nodes to the target Kubernetes version (replaces all worker nodes within the
maxUnavailablebudget).
Cluster API orchestrates the rolling replacement.
Skipping step 1 risks two failure modes: the old provider silently ignores new schema fields; or a controller image swap mid-rollout interrupts the plan secret state machine. Always settle the management-side upgrade before touching workload rollout.
Prerequisites
Before you start, ensure all of the following prerequisites are met:
- The Distribution Version upgrade is complete.
- The control plane is reachable through the existing
<control-plane-vip>:<control-plane-port>. - All current nodes are healthy and
Ready. - The target Kubernetes version is a key in
elemental-image-catalog. If it is not, add it before starting (see Update the Image Catalog). - The platform registry is reachable from every host in the cluster.
- Both
KubeadmControlPlane.spec.rolloutStrategy.rollingUpdate.maxSurge = 0andMachineDeployment.spec.strategy.rollingUpdate.maxSurge = 0are set — bare-metal does not over-provision physical hosts. - The relevant pools have enough capacity to replace one node at a time without falling below the desired replica count.
Update the Image Catalog
elemental-image-catalog is the resource that introduces a Kubernetes version to the bare-metal provider.
In most upgrades you do not edit this ConfigMap by hand. The bare-metal provider plugin re-renders elemental-image-catalog with the Kubernetes versions shipped by the new distribution when it is reapplied during the Distribution Version upgrade (Phase 1). Your task is normally just to verify that the target version is present (see the verification step below). The two options that follow are only needed when you must add an out-of-band version — for example a digest-pinned or test build that the plugin does not ship.
Option A — Add a chart override. Append the new version under provider.imageCatalog.images (uses global.registry.address as the registry):
Reapply the bare-metal provider plugin. The chart re-renders the ConfigMap; the provider's in-process watch hot-reloads the cache without restarting.
Option B — Patch the ConfigMap directly. Useful for digest-pinned images or out-of-band test versions:
In either case, verify before continuing:
The key must include the leading v (for example v1.34.5). If the key is missing when CAPI creates the replacement Machine, the resulting BaremetalMachine enters Failed with ImageResolved=False / Reason=ImageCatalogMiss and no reprovision plan is written — adding the key later restarts the reconciliation automatically.
When the new version corresponds to a new MicroOS / base-image release, the ISO variant for that release should also be available before the upgrade. The bare-metal provider does not rebuild SeedImages automatically, but you may want to refresh MachineRegistration / SeedImage for future host onboarding so the new hosts come up on the new release. The ISO repository is the same as the base-image repository with -iso appended.
Upgrade the Control Plane
Patch KubeadmControlPlane.spec.version to the new Kubernetes version. Where component image tags are pinned (DNS, etcd), update them in the same edit:
spec.version← target Kubernetes version (must match anelemental-image-catalogkey).spec.kubeadmConfigSpec.clusterConfiguration.dns.imageTag← matching CoreDNS image tag for the new release.spec.kubeadmConfigSpec.clusterConfiguration.etcd.local.imageTag← matching etcd image tag for the new release.
The bare-metal provider does not require a new BaremetalMachineTemplate for a Kubernetes-only upgrade: the template only carries the pool reference, and the upgrade image is resolved from Machine.spec.version through the catalog. A new template is only required when you want to move the control plane to a different pool.
Cluster API rolls control-plane nodes one at a time (because maxSurge=0):
The expected sequence on each node:
- CAPI marks one old
Machinefor deletion; correspondingBaremetalMachinemoves toPreparing. - The provider writes a
cleanplan; the inventory ends upAvailablewith cleared owner annotations. - CAPI creates a new
Machine(with the target version); a newBaremetalMachineis created. - The provider allocates an
Availableinventory from the control-plane pool, resolves the new image, and writes areprovisionplan. - The host runs
elemental upgrade --system <new-image>, reboots, andkubeadm joins the surviving control plane.BaremetalMachine.status.phasebecomesRunning.
Repeat steps 1–5 until every control-plane node has been replaced.
Throughout the rollout, alive continues to manage the VIP. As control-plane membership changes, the bare-metal provider re-renders the alive chart values (the peer list and ipvs.ips) and rolls out the static-pod manifests.
Upgrade Workers
Patch each MachineDeployment.spec.template.spec.version to the new Kubernetes version. CAPI replaces workers within the maxUnavailable budget — the bare-metal provider re-uses the same worker pool and the same image-catalog resolution logic as the control plane.
Watch:
Worker upgrades carry fewer knobs than the control plane: there is no clusterConfiguration block on MachineDeployment, so no DNS / etcd tags to update. Kubernetes component versions on worker nodes follow the new base image.
If the new release also requires a bootstrap-template change (different cloud-init, different kubeletExtraArgs), follow Updating Bootstrap Templates — that is a separate template swap, independent of the version bump.
Cross-Version Upgrades
Kubernetes only supports single-minor upgrades for the control plane (skew policy). To move from v1.32 → v1.34:
- Add the v1.33 entry to
elemental-image-catalogand upgrade the control plane to v1.33; wait for the rollout to complete. - Add the v1.34 entry; upgrade the control plane to v1.34.
- Upgrade the worker
MachineDeploymentto v1.34.
The bare-metal rollout strategy already serializes node replacement, so cross-version upgrades do not require additional pool capacity beyond what was already needed for a same-minor upgrade.
Verification
After the rollout completes:
All BaremetalMachine objects should be Running. Every CAPI Machine should report the new spec.version. Every workload Node should be Ready with the new kubelet version, and every MachineInventory plan secret should carry baremetal.alauda.io/plan.type=reprovision (the most recent plan applied).
MachineInventoryPool.status should satisfy available + allocated + preparing + reprovisioning + unavailable = total with preparing = reprovisioning = 0.
Troubleshooting
For the full operator-side state machine reference (every condition reason and recovery action), see Provider Overview.