How to scale down a specific node in Tanzu Kubernetes Grid

This post describes the process of scaling down a specific node in TKG and is used in scenarios where you would like to remove a specific node when scaling down the number of control plane nodes or worker nodes.

High level the process:

Identify the nodes which need scaling down
Identify the corresponding machine object
Add "cluster.x-k8s.io/delete-machine"="yes" annotation to the machine object
Perform a scale-down operation using tanzu cli

Identify the node to scale down¶

Switch the context to the cluster where you need to perform the scale-down operation. In the below example, we have a cluster with one control plane and three worker nodes

kubectl get nodes
NAME                                 STATUS   ROLES                  AGE   VERSION
oom-wld-rp02-control-plane-qrnrb     Ready    control-plane,master   11h   v1.20.5+vmware.1
oom-wld-rp02-md-0-7559b5578d-62zfl   Ready    <none>                 26m   v1.20.5+vmware.1
oom-wld-rp02-md-0-7559b5578d-7tw9k   Ready    <none>                 39m   v1.20.5+vmware.1
oom-wld-rp02-md-0-7559b5578d-pxr54   Ready    <none>                 11h   v1.20.5+vmware.1

We will scale down our cluster to two worker nodes by removing oom-wld-rp02-md-0-7559b5578d-pxr54.

Identify the corresponding machine object¶

Switch Context to the management cluster¶

kubectl config use-context  oom-mgmt-rp02-admin@oom-mgmt-rp02
Switched to context "oom-mgmt-rp02-admin@oom-mgmt-rp02".

Get the corresponding machine object¶

As we can see from the output below node oom-wld-rp02-md-0-7559b5578d-pxr54 corresponds to machine object oom-wld-rp02-md-0-7559b5578d-pxr54

kubectl get machines
NAME                                 PROVIDERID                                       PHASE     VERSION
oom-wld-rp02-control-plane-qrnrb     vsphere://423823f9-4a51-ad85-e352-9b0c91767d92   Running   v1.20.5+vmware.1
oom-wld-rp02-md-0-7559b5578d-62zfl   vsphere://423873bf-f87d-1e35-02f0-71e3f5973d2b   Running   v1.20.5+vmware.1
oom-wld-rp02-md-0-7559b5578d-7tw9k   vsphere://4238131a-f866-d033-766f-56192a093a80   Running   v1.20.5+vmware.1
oom-wld-rp02-md-0-7559b5578d-pxr54   vsphere://42382d27-aeb2-7c59-674a-06fc06e70fa2   Running   v1.20.5+vmware.1

Add annotation to the machine object¶

As detailed in the cluster-api source code we will annotate the machine with DeleteMachineAnnotation

// DeleteMachineAnnotation marks control plane and worker nodes that will be given priority for deletion when KCP or a machineset scales down. This annotation is given top priority on all delete policies DeleteMachineAnnotation = "cluster.x-k8s.io/delete-machine"

# Annotate Object
kubectl annotate machine oom-wld-rp02-md-0-7559b5578d-pxr54 "cluster.x-k8s.io/delete-machine"="yes"
  machine.cluster.x-k8s.io/oom-wld-rp02-md-0-7559b5578d-pxr54 annotated

# Confirm annotation
kubectl get machine oom-wld-rp02-md-0-7559b5578d-pxr54 -o yaml | grep delete-mach
    cluster.x-k8s.io/delete-machine: "yes"

Perform scale-down operation¶

# Scale down
tanzu cluster scale oom-wld-rp02 -w 2
Successfully updated worker node machine deployment replica count for cluster oom-wld-rp02
Workload cluster 'oom-wld-rp02' is being scaled

As we can see from the output below, the annotated machine is picked up for deletion during the scale-down operation.

kubectl get machines
NAME                                 PROVIDERID                                       PHASE      VERSION
oom-wld-rp02-control-plane-qrnrb     vsphere://423823f9-4a51-ad85-e352-9b0c91767d92   Running    v1.20.5+vmware.1
oom-wld-rp02-md-0-7559b5578d-62zfl   vsphere://423873bf-f87d-1e35-02f0-71e3f5973d2b   Running    v1.20.5+vmware.1
oom-wld-rp02-md-0-7559b5578d-7tw9k   vsphere://4238131a-f866-d033-766f-56192a093a80   Running    v1.20.5+vmware.1
oom-wld-rp02-md-0-7559b5578d-pxr54   vsphere://42382d27-aeb2-7c59-674a-06fc06e70fa2   Deleting   v1.20.5+vmware.1