Scaling Tanzu Kubernetes Grid control plane and worker nodes vertically

Horizontal scaling of a TKG cluster is a straightforward task. Vertical scaling is a more manual and involved approach at present. The TKG documentation highlights the high-level steps to achieve this by leveraging instructions from ClusterAPI documentation. This post aims to provide a walkthrough of the procedure with examples.

High level the process:

Save and make a copy of an existing infrastructure template, VSphereMachineTemplate in this example
Create and deploy a new VSphereMachineTemplate object with the updated configuration and a new name
Update the MachineDeployment to vertically scale worker nodes and update KubeadmControlPlane to vertically scale the control plane nodes

Save existing VSphereMachineTemplate for the control plane and worker nodes¶

Switch the context to the management cluster. The commands below assume that the workload cluster that needs vertical scaling, is in the default namespace. If your cluster is in a namespace other than the default use -n <namespace> to target the appropriate namespace of your workload cluster.

kubectl config use-context oom-mgmt-rp01-admin@oom-mgmt-rp01
  Switched to context "oom-mgmt-rp01-admin@oom-mgmt-rp01".

Check the available templates in your environment and export them to a file.

kubectl get vspheremachinetemplates.infrastructure.cluster.x-k8s.io

NAME AGE
oom-wld-control-plane 16h
oom-wld-worker 16h

kubectl get vspheremachinetemplates.infrastructure.cluster.x-k8s.io oom-wld-control-plane -o yaml > oom-wld-control-plane-new.yaml
kubectl get vspheremachinetemplates.infrastructure.cluster.x-k8s.io oom-wld-worker -o yaml > oom-wld-worker-new.yaml

The control plane and worker node VSphereMachineTemplate should look similar to the output below

# Control Plane
---
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3
kind: VSphereMachineTemplate
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"infrastructure.cluster.x-k8s.io/v1alpha3","kind":"VSphereMachineTemplate","metadata":{"annotations":{},"name":"oom-wld-control-plane","namespace":"default"},"spec":{"template":{"spec":{"cloneMode":"fullClone","datacenter":"/Datacenter","datastore":"/Datacenter/datastore/vsanDatastore","diskGiB":20,"folder":"/Datacenter/vm/env06","memoryMiB":4096,"network":{"devices":[{"dhcp4":true,"networkName":"/Datacenter/network/VM Network"}]},"numCPUs":4,"resourcePool":"/Datacenter/host/Cluster/Resources/RP06","server":"10.186.198.51","storagePolicyName":"","template":"/Datacenter/vm/tkg/ubuntu-2004-kube-v1-21-2+vmware-1-tkg-1-7832907791984498322"}}}}
  creationTimestamp: "2021-09-14T15:35:25Z"
  generation: 1
  name: oom-wld-control-plane
  namespace: default
  ownerReferences:
  - apiVersion: cluster.x-k8s.io/v1alpha3
    kind: Cluster
    name: oom-wld
    uid: 80ec55e2-d809-4f80-864b-4ca02315bd23
  resourceVersion: "731343"
  uid: f7b9b894-be5b-4922-a7b1-51738e626709
spec:
  template:
    spec:
      cloneMode: fullClone
      datacenter: /Datacenter
      datastore: /Datacenter/datastore/vsanDatastore
      diskGiB: 20
      folder: /Datacenter/vm/env06
      memoryMiB: 4096
      network:
        devices:
        - dhcp4: true
          networkName: /Datacenter/network/VM Network
      numCPUs: 4
      resourcePool: /Datacenter/host/Cluster/Resources/RP06
      server: 10.186.198.51
      storagePolicyName: ""
      template: /Datacenter/vm/tkg/ubuntu-2004-kube-v1-21-2+vmware-1-tkg-1-7832907791984498322

# Worker Nodes
---
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3
kind: VSphereMachineTemplate
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"infrastructure.cluster.x-k8s.io/v1alpha3","kind":"VSphereMachineTemplate","metadata":{"annotations":{},"name":"oom-wld-worker","namespace":"default"},"spec":{"template":{"spec":{"cloneMode":"fullClone","datacenter":"/Datacenter","datastore":"/Datacenter/datastore/vsanDatastore","diskGiB":20,"folder":"/Datacenter/vm/env06","memoryMiB":8192,"network":{"devices":[{"dhcp4":true,"networkName":"/Datacenter/network/VM Network"}]},"numCPUs":4,"resourcePool":"/Datacenter/host/Cluster/Resources/RP06","server":"10.186.198.51","storagePolicyName":"","template":"/Datacenter/vm/tkg/ubuntu-2004-kube-v1-21-2+vmware-1-tkg-1-7832907791984498322"}}}}
  creationTimestamp: "2021-09-14T15:35:25Z"
  generation: 1
  name: oom-wld-worker
  namespace: default
  ownerReferences:
  - apiVersion: cluster.x-k8s.io/v1alpha3
    kind: Cluster
    name: oom-wld
    uid: 80ec55e2-d809-4f80-864b-4ca02315bd23
  resourceVersion: "731305"
  uid: 9e23416a-9412-4fc1-b192-0db0893f14d8
spec:
  template:
    spec:
      cloneMode: fullClone
      datacenter: /Datacenter
      datastore: /Datacenter/datastore/vsanDatastore
      diskGiB: 20
      folder: /Datacenter/vm/env06
      memoryMiB: 8192
      network:
        devices:
        - dhcp4: true
          networkName: /Datacenter/network/VM Network
      numCPUs: 4
      resourcePool: /Datacenter/host/Cluster/Resources/RP06
      server: 10.186.198.51
      storagePolicyName: ""
      template: /Datacenter/vm/tkg/ubuntu-2004-kube-v1-21-2+vmware-1-tkg-1-7832907791984498322

Remove the fields containing object metadata from the yaml files above. Post metadata removal the final yaml should look similar to the ones below. After the object metadata removal, in this scenario we want to bump up the CPU from 4 to 8 for both control plane and worker nodes. The two fields that needs update in order to achieve this are metadata.name and .spec.template.spec.numCPUs. These fields have been updated in the yaml files(oom-wld-control-plane-new.yaml and oom-wld-worker-new.yaml) as shown below.

# Control Plane
# cat oom-wld-control-plane-new.yaml
---
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3
kind: VSphereMachineTemplate
metadata:
  name: oom-wld-control-plane-8cpu
  namespace: default
spec:
  template:
    spec:
      cloneMode: fullClone
      datacenter: /Datacenter
      datastore: /Datacenter/datastore/vsanDatastore
      diskGiB: 20
      folder: /Datacenter/vm/env06
      memoryMiB: 4096
      network:
        devices:
        - dhcp4: true
          networkName: /Datacenter/network/VM Network
      numCPUs: 8
      resourcePool: /Datacenter/host/Cluster/Resources/RP06
      server: 10.186.198.51
      storagePolicyName: ""
      template: /Datacenter/vm/tkg/ubuntu-2004-kube-v1-21-2+vmware-1-tkg-1-7832907791984498322

# Worker node
# cat oom-wld-worker-new.yaml
---
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3
kind: VSphereMachineTemplate
metadata:
  name: oom-wld-worker-8cpu
  namespace: default
spec:
  template:
    spec:
      cloneMode: fullClone
      datacenter: /Datacenter
      datastore: /Datacenter/datastore/vsanDatastore
      diskGiB: 20
      folder: /Datacenter/vm/env06
      memoryMiB: 8192
      network:
        devices:
        - dhcp4: true
          networkName: /Datacenter/network/VM Network
      numCPUs: 8
      resourcePool: /Datacenter/host/Cluster/Resources/RP06
      server: 10.186.198.51
      storagePolicyName: ""
      template: /Datacenter/vm/tkg/ubuntu-2004-kube-v1-21-2+vmware-1-tkg-1-7832907791984498322

Deploy the new VSphereMachineTemplate objects¶

The next step is to deploy or create these templates in the management cluster

kubectl apply -f oom-wld-control-plane-new.yaml
vspheremachinetemplate.infrastructure.cluster.x-k8s.io/oom-wld-control-plane-8cpu created

kubectl apply -f oom-wld-worker-new.yaml
vspheremachinetemplate.infrastructure.cluster.x-k8s.io/oom-wld-worker-8cpu created

View the newly created templates

kubectl get vspheremachinetemplates.infrastructure.cluster.x-k8s.io
NAME AGE
oom-wld-control-plane 16h
oom-wld-control-plane-8cpu 12s
oom-wld-worker 16h
oom-wld-worker-8cpu 13s

At this point these new templates are ready to be used but the actual vertical scaling has not kicked in yet.

Scaling TKG control plane and worker nodes vertically¶

Scaling Control plane nodes¶

To scale the control plane nodes modify the KubeadmControlPlane object and change the spec.infrastructureTemplate.name field of the KubeadmControlPlane object.

kubectl get kubeadmcontrolplanes.controlplane.cluster.x-k8s.io oom-wld-control-plane -o jsonpath='{.spec.infrastructureTemplate.name}{"\n"}'
oom-wld-control-plane

kubectl edit kubeadmcontrolplanes.controlplane.cluster.x-k8s.io oom-wld-control-plane
kubeadmcontrolplane.controlplane.cluster.x-k8s.io/oom-wld-control-plane edited

kubectl get kubeadmcontrolplanes.controlplane.cluster.x-k8s.io oom-wld-control-plane -o jsonpath='{.spec.infrastructureTemplate.name}{"\n"}'
oom-wld-control-plane-8cpu

The commands above will trigger an update of the control plane.

kubectl get machine
NAME                            PROVIDERID                                       PHASE          VERSION
oom-wld-control-plane-gm4rw     vsphere://42292fa2-3c30-bee4-467f-e509e320f76b   Running        v1.21.2+vmware.1
oom-wld-control-plane-kpxxm                                                      Provisioning   v1.21.2+vmware.1
oom-wld-md-0-57f7b8f8d8-fn487   vsphere://42293388-c4db-af6b-db38-2f31b6eb3381   Running        v1.21.2+vmware.1
oom-wld-md-0-57f7b8f8d8-lpdnj   vsphere://4229269c-11f0-06b5-9ba1-5fc569fa94b2   Running        v1.21.2+vmware.1

Scaling worker nodes¶

To scale the worker nodes modify the MachineDeployment object and change the spec.template.spec.infrastructureRef.name field of the MachineDeployment object.

kubectl get md oom-wld-md-0 -o jsonpath='{.spec.template.spec.infrastructureRef.name}{"\n"}'
oom-wld-worker

kubectl edit md oom-wld-md-0
machinedeployment.cluster.x-k8s.io/oom-wld-md-0 edited

kubectl get md oom-wld-md-0 -o jsonpath='{.spec.template.spec.infrastructureRef.name}{"\n"}'
oom-wld-worker-8cpu

Similar to the behavior we observed during control plane node scaling, the commands above will trigger the rollout of new worker nodes.

kubectl get machines

NAME PROVIDERID PHASE VERSION
oom-wld-control-plane-gm4rw vsphere://42292fa2-3c30-bee4-467f-e509e320f76b Running v1.21.2+vmware.1
oom-wld-md-0-57f7b8f8d8-fn487 Provisioning v1.21.2+vmware.1
oom-wld-md-0-64cf7d8b7-pxb8d vsphere://4229e551-60fd-c514-083a-d62b9c703253 Running v1.21.2+vmware.1
oom-wld-md-0-64cf7d8b7-zvszp vsphere://422925b0-a046-f35c-dbc2-0f148f9c2d04 Running v1.21.2+vmware.1

Once the provisioning finishes your control plane and worker nodes will be created with the new desired configuration.