Identifying component leaders of TKGI components
+++ author = "Shubham Sharma" title = "Identifying component leaders of TKGI components" menuTitle = "Identifying component leaders of TKGI components" date = "2022-08-18" description = "How to identify component leaders of TKGI components" series = ["TKGI"] +++
There are multiple components in TKGI which operate in a leader/follower mode. In this high availability pattern, the leader is the entry point of requests and is responsible for coordinating tasks with the followers. The components that fall into this category are
- Etcd
- NCP
- Kubernetes controller manager
- Kubernetes Scheduler
- CSI Components
In a multi-control plane and worker node environment, if you want to monitor the activity or log of these components tracking down the leader can be tricky. The steps in this post explain how you can track the leader easily.
For the below components leader election uses lease
API from the coordination.k8s.io
API group to identify the leading replica and continuously renew it based on the timestamps monitored by Lease Duration Seconds
- Kubernetes controller manager
- Kubernetes Scheduler
- CSI Components
The leader can be identified using the steps below.
Identify the leaseholder¶
kubectl get leases.coordination.k8s.io -A | grep -v node
NAMESPACE NAME HOLDER AGE
kube-system kube-controller-manager ad975454-1101-4a24-b2fa-25705d3b9dc0_faf633cc-0d5a-4b8a-ba45-c85bbbd50024 127m
kube-system kube-scheduler ad975454-1101-4a24-b2fa-25705d3b9dc0_8109191c-1eb4-4d13-967b-1735e19086fb 127m
vmware-system-csi csi-vsphere-vmware-com ad975454-1101-4a24-b2fa-25705d3b9dc0 127m
vmware-system-csi external-attacher-leader-csi-vsphere-vmware-com ad975454-1101-4a24-b2fa-25705d3b9dc0 127m
vmware-system-csi external-resizer-csi-vsphere-vmware-com ad975454-1101-4a24-b2fa-25705d3b9dc0 127m
vmware-system-csi vsphere-syncer ad975454-1101-4a24-b2fa-25705d3b9dc0 127m
- The names in the
Holder
column are the nodes that are holding the lease. These holder names do not correspond to the Kubernetes node names though. The holder names arebosh
deployed VMs hostnames.
bosh -d service-instance_aeec33f2-0c07-444f-a20e-3648d3ac18ed ssh master hostname | egrep -v 'subject|to|use'
master/a2cb06fc-c6d2-477c-bdfb-6212591b38c6: stdout | 6e2aa260-2ec5-4537-9133-46192d858a3b
master/31c0f1f6-2104-4479-a4e3-39ed63aadc5c: stdout | f8ad35c5-198d-46c8-bdb7-bbf610b81329
master/9ddb3dfe-a988-4249-a2e7-0ba1ec0ac47b: stdout | ad975454-1101-4a24-b2fa-25705d3b9dc0
- As clear from the output above all the leases in this environment are held by a node with hostname
ad975454-1101-4a24-b2fa-25705d3b9dc0
which ismaster/9ddb3dfe-a988-4249-a2e7-0ba1ec0ac47b
- This means the replica running on this node will have the leader for these components. You can
bosh ssh
to this node to monitor and check out the logs.
Identifying ETCD leader¶
- The below command gives us the
etcd
leader which ismaster/9ddb3dfe-a988-4249-a2e7-0ba1ec0ac47b
as well
bosh -d service-instance_aeec33f2-0c07-444f-a20e-3648d3ac18ed ssh master/0 "ETCDCTL_API=3 /var/vcap/jobs/etcd/bin/etcdctl endpoint status" | egrep -v 'subject|to|use' | grep true
master/9ddb3dfe-a988-4249-a2e7-0ba1ec0ac47b: stdout | https://master-0.etcd.cfcr.internal:2379, 17f206fd866fdab2, 3.5.4, 5.5 MB, true, false, 4, 28536, 28536,