How to safely remove a worker node from TKGM clusters

Environment Details¶

Tanzu Cluster Details¶

tanzu cluster list
  NAME          NAMESPACE  STATUS   CONTROLPLANE  WORKERS  KUBERNETES        ROLES   PLAN
  wph-wld-rp01  default    running  1/1           3/3      v1.21.2+vmware.1  <none>  dev

kubectl config use-context wph-wld-rp01-admin@wph-wld-rp01
Switched to context "wph-wld-rp01-admin@wph-wld-rp01".

kubectl get nodes
NAME                                 STATUS   ROLES                  AGE     VERSION
wph-wld-rp01-control-plane-c7mxm     Ready    control-plane,master   17d     v1.21.2+vmware.1
wph-wld-rp01-md-0-64fc56fb95-7zhpw   Ready    <none>                 3m20s   v1.21.2+vmware.1
wph-wld-rp01-md-0-64fc56fb95-hvqsb   Ready    <none>                 17d     v1.21.2+vmware.1
wph-wld-rp01-md-0-64fc56fb95-mvbj5   Ready    <none>                 3m40s   v1.21.2+vmware.1

# From Management cluster context

kubectl config use-context ph-mgmt-rp01-admin@ph-mgmt-rp01
Switched to context "ph-mgmt-rp01-admin@ph-mgmt-rp01".

kubectl get machines
NAME                                 PROVIDERID                                       PHASE     VERSION
wph-wld-rp01-control-plane-c7mxm     vsphere://423c40ed-a5fe-669d-0bb7-92432f23b36b   Running   v1.21.2+vmware.1
wph-wld-rp01-md-0-64fc56fb95-7zhpw   vsphere://423cb203-15e6-e024-fb5c-fa62e555defa   Running   v1.21.2+vmware.1
wph-wld-rp01-md-0-64fc56fb95-hvqsb   vsphere://423ce2cf-ce49-fca2-f10c-4d7996b5fc74   Running   v1.21.2+vmware.1
wph-wld-rp01-md-0-64fc56fb95-mvbj5   vsphere://423c4715-64ac-decc-7634-522888597e45   Running   v1.21.2+vmware.1

Safely Removing a worker nodes¶

Select the worker node for deletion. In this example - wph-wld-rp01-md-0-64fc56fb95-mvbj5

Switch to workload cluster context¶

kubectl config use-context wph-wld-rp01-admin@wph-wld-rp01

Drain the node¶

Drain the nodes using kubectl drain
Depending on the workload on this node other options may be needed. The two other frequently used options are
delete-local-data - Continue even if there are pods using emptyDir (local data that will be deleted when the node is drained)
force - Continue even if there are pods that do not declare a controller.
Other drain options

kubectl drain wph-wld-rp01-md-0-64fc56fb95-mvbj5 --ignore-daemonsets

node/wph-wld-rp01-md-0-64fc56fb95-mvbj5 already cordoned
Warning: ignoring DaemonSet-managed Pods: kube-system/calico-node-r5sfv, kube-system/kube-proxy-52pdn, kube-system/vsphere-csi-node-nq4xd
node/wph-wld-rp01-md-0-64fc56fb95-mvbj5 drained

Make sure scheduling is disabled¶

kubectl get nodes

NAME                                 STATUS                     ROLES                  AGE   VERSION
wph-wld-rp01-control-plane-c7mxm     Ready                      control-plane,master   17d   v1.21.2+vmware.1
wph-wld-rp01-md-0-64fc56fb95-7zhpw   Ready                      <none>                 12m   v1.21.2+vmware.1
wph-wld-rp01-md-0-64fc56fb95-hvqsb   Ready                      <none>                 17d   v1.21.2+vmware.1
wph-wld-rp01-md-0-64fc56fb95-mvbj5   Ready,SchedulingDisabled   <none>                 12m   v1.21.2+vmware.1

Delete the node¶

kubectl delete node wph-wld-rp01-md-0-64fc56fb95-mvbj5

node "wph-wld-rp01-md-0-64fc56fb95-mvbj5" deleted

Observing changes in the management cluster¶

kubectl delete node on workload cluster will trigger machine object deletion as well
As seen from the output below the older machine object is deleted and the new machine `wph-wld-rp01-md-0-64fc56fb95-wqt7d`` is being provisioned now

kubectl config use-context ph-mgmt-rp01-admin@ph-mgmt-rp01
Switched to context "ph-mgmt-rp01-admin@ph-mgmt-rp01".

kubectl get machines

NAME                                 PROVIDERID                                       PHASE          VERSION
wph-wld-rp01-control-plane-c7mxm     vsphere://423c40ed-a5fe-669d-0bb7-92432f23b36b   Running        v1.21.2+vmware.1
wph-wld-rp01-md-0-64fc56fb95-7zhpw   vsphere://423cb203-15e6-e024-fb5c-fa62e555defa   Running        v1.21.2+vmware.1
wph-wld-rp01-md-0-64fc56fb95-hvqsb   vsphere://423ce2cf-ce49-fca2-f10c-4d7996b5fc74   Running        v1.21.2+vmware.1
wph-wld-rp01-md-0-64fc56fb95-wqt7d                                                    Provisioning   v1.21.2+vmware.1

# Provisioning Complete

kubectl get machines
NAME                                 PROVIDERID                                       PHASE     VERSION
wph-wld-rp01-control-plane-c7mxm     vsphere://423c40ed-a5fe-669d-0bb7-92432f23b36b   Running   v1.21.2+vmware.1
wph-wld-rp01-md-0-64fc56fb95-7zhpw   vsphere://423cb203-15e6-e024-fb5c-fa62e555defa   Running   v1.21.2+vmware.1
wph-wld-rp01-md-0-64fc56fb95-hvqsb   vsphere://423ce2cf-ce49-fca2-f10c-4d7996b5fc74   Running   v1.21.2+vmware.1
wph-wld-rp01-md-0-64fc56fb95-wqt7d   vsphere://423cf20b-d028-993d-f452-f9c303e98ce5   Running   v1.21.2+vmware.1

Verify new worker node is added¶

New node wph-wld-rp01-md-0-64fc56fb95-wqt7d is added with the same id as the newly spun up machine wph-wld-rp01-md-0-64fc56fb95-wqt7d from the output above

kubectl config use-context wph-wld-rp01-admin@wph-wld-rp01

kubectl get nodes

NAME                                 STATUS   ROLES                  AGE     VERSION
wph-wld-rp01-control-plane-c7mxm     Ready    control-plane,master   17d     v1.21.2+vmware.1
wph-wld-rp01-md-0-64fc56fb95-7zhpw   Ready    <none>                 19m     v1.21.2+vmware.1
wph-wld-rp01-md-0-64fc56fb95-hvqsb   Ready    <none>                 17d     v1.21.2+vmware.1
wph-wld-rp01-md-0-64fc56fb95-wqt7d   Ready    <none>                 4m53s   v1.21.2+vmware.1

Alternative Approach¶

Alternatively, after draining the worker node you want to delete, you can use the instructions on How to scale down a specific node in Tanzu Kubernetes Grid to remove that specific node.