How to rotate certificates of a TKGM cluster without upgrading
When TKG management and workload clusters are created, certificates are generated during the kubeadm
initialization phase. These certificates expire after a year. These certificates are automatically rotated when you upgrade your clusters. But what if you can not upgrade your clusters within a year's time frame?
This post shares the steps on how certificates can be rotated without upgrading your clusters. At a high level, this process triggers a rollout of control planes so they enter the kubeadm
initialization phase again and generate new certs. For a similar walkthrough in the TKGS environment check out this post
Info
The upstream community is working on a PR to automatically renew control plane machine certificates. This should remove the need for any manual intervention in the future.
This PR achieves certificate rotation on control plane machine by repaving the machines. It is achieved by doing the following:
- Add an annotation (machine.cluster.x-k8s.io/certificates-expiry-date) on KubeadmBootstrapConfig objects that captures the certificates expiry date (1 year from the creation time) Update the machine status with certificate expiry date by either reading that annotation on the machine object or the bootstrap config object.
- Add a field to KCP called kcp.spec.rolloutBefore.certificatesExpiryDays that can be used to trigger a rollout if the control plane machine's certificates will expire within the specified days.
Environment Details Before Rotation¶
Get the list of nodes and check the certificate expiration
kubectl get nodes \
-o jsonpath='{.items[*].status.addresses[?(@.type=="ExternalIP")].address}' \
-l node-role.kubernetes.io/master= > nodes
for i in `cat nodes`; do
printf "\n######\n"
ssh -o "StrictHostKeyChecking=no" -q capv@$i hostname
ssh -o "StrictHostKeyChecking=no" -q capv@$i sudo kubeadm certs check-expiration
done;
Sample output¶
######
workload-slot35rp10-control-plane-ggsmj
[check-expiration] Reading configuration from the cluster...
[check-expiration] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
W0923 17:51:03.686273 4172778 utils.go:69] The recommended value for "resolvConf" in "KubeletConfiguration" is: /run/systemd/resolve/resolv.conf; the provided value is: /run/systemd/resolve/resolv.conf
CERTIFICATE EXPIRES RESIDUAL TIME CERTIFICATE AUTHORITY EXTERNALLY MANAGED
admin.conf Sep 21, 2023 23:13 UTC 363d ca no
apiserver Sep 21, 2023 23:13 UTC 363d ca no
apiserver-etcd-client Sep 21, 2023 23:13 UTC 363d etcd-ca no
apiserver-kubelet-client Sep 21, 2023 23:13 UTC 363d ca no
controller-manager.conf Sep 21, 2023 23:13 UTC 363d ca no
etcd-healthcheck-client Sep 21, 2023 23:13 UTC 363d etcd-ca no
etcd-peer Sep 21, 2023 23:13 UTC 363d etcd-ca no
etcd-server Sep 21, 2023 23:13 UTC 363d etcd-ca no
front-proxy-client Sep 21, 2023 23:13 UTC 363d front-proxy-ca no
scheduler.conf Sep 21, 2023 23:13 UTC 363d ca no
CERTIFICATE AUTHORITY EXPIRES RESIDUAL TIME EXTERNALLY MANAGED
ca Sep 18, 2032 23:09 UTC 9y no
etcd-ca Sep 18, 2032 23:09 UTC 9y no
front-proxy-ca Sep 18, 2032 23:09 UTC 9y no
Certificate rotation using KubeadmControlPlane(KCP) rollout¶
Switch to management cluster context
Get KCP¶
k get kcp
NAME CLUSTER INITIALIZED API SERVER AVAILABLE REPLICAS READY UPDATED UNAVAILABLE AGE VERSION
workload-slot35rp10-control-plane workload-slot35rp10 true true 3 3 3 0 42h v1.23.8+vmware.2
Trigger certificate rotation¶
# Preferred
kubectl patch kcp workload-slot35rp10-control-plane --type merge -p "{\"spec\":{\"rolloutAfter\":\"`date +'%Y-%m-%dT%TZ'`\"}}
k patch kcp prz-mgmt-rp03-control-plane \
-n tkg-system \
--type "json" \
-p '[{"op":"add","path":"/spec/kubeadmConfigSpec/preKubeadmCommands/-","value":"echo \"$(date)\" >> /tmp/kcp_recreate_date.log"}]'
Machine rollout begins¶
k get machines
NAME CLUSTER NODENAME PROVIDERID PHASE AGE VERSION
workload-slot35rp10-control-plane-7z95k workload-slot35rp10 Provisioning 20s v1.23.8+vmware.2
workload-slot35rp10-control-plane-ggsmj workload-slot35rp10 workload-slot35rp10-control-plane-ggsmj vsphere://4201a86e-3c15-879a-1b85-78f76a16c27f Running 42h v1.23.8+vmware.2
workload-slot35rp10-control-plane-hxbxb workload-slot35rp10 workload-slot35rp10-control-plane-hxbxb vsphere://42014b2e-07e4-216a-24ef-86e2d52d7bbd Running 42h v1.23.8+vmware.2
workload-slot35rp10-control-plane-sm4nw workload-slot35rp10 workload-slot35rp10-control-plane-sm4nw vsphere://4201cff3-2715-ffe1-c4a6-35fc795995ce Running 42h v1.23.8+vmware.2
workload-slot35rp10-md-0-667bcd6b57-79br9 workload-slot35rp10 workload-slot35rp10-md-0-667bcd6b57-79br9 vsphere://420142a2-d141-7d6b-b322-9c2afcc47da5 Running 42h v1.23.8+vmware.2
workload-slot35rp10-md-1-7bdfdcf7f-rhc8j workload-slot35rp10 workload-slot35rp10-md-1-7bdfdcf7f-rhc8j vsphere://420115c0-3672-4da7-dd16-77ef4e0c557f Running 42h v1.23.8+vmware.2
workload-slot35rp10-md-2-5bb8468b59-z4jdf workload-slot35rp10 workload-slot35rp10-md-2-5bb8468b59-z4jdf vsphere://42019a7e-4900-84ed-2a34-135ee837952f Running 42h v1.23.8+vmware.2
Machine status post rollout¶
k get machines
NAME CLUSTER NODENAME PROVIDERID PHASE AGE VERSION
workload-slot35rp10-control-plane-4xgw8 workload-slot35rp10 workload-slot35rp10-control-plane-4xgw8 vsphere://42011ef0-2abb-b934-a03b-ce995d5e2b8e Running 13m v1.23.8+vmware.2
workload-slot35rp10-control-plane-7z95k workload-slot35rp10 workload-slot35rp10-control-plane-7z95k vsphere://42018773-23ab-cb58-89b7-0d5e6656aca1 Running 20m v1.23.8+vmware.2
workload-slot35rp10-control-plane-xwhgj workload-slot35rp10 workload-slot35rp10-control-plane-xwhgj vsphere://4201b550-9388-52ad-6848-8f05d885bb9c Running 17m v1.23.8+vmware.2
workload-slot35rp10-md-0-667bcd6b57-79br9 workload-slot35rp10 workload-slot35rp10-md-0-667bcd6b57-79br9 vsphere://420142a2-d141-7d6b-b322-9c2afcc47da5 Running 43h v1.23.8+vmware.2
workload-slot35rp10-md-1-7bdfdcf7f-rhc8j workload-slot35rp10 workload-slot35rp10-md-1-7bdfdcf7f-rhc8j vsphere://420115c0-3672-4da7-dd16-77ef4e0c557f Running 43h v1.23.8+vmware.2
workload-slot35rp10-md-2-5bb8468b59-z4jdf workload-slot35rp10 workload-slot35rp10-md-2-5bb8468b59-z4jdf vsphere://42019a7e-4900-84ed-2a34-135ee837952f Running 43h v1.23.8+vmware.2
Verify cert rotation¶
Switch to workload cluster context¶
Get Certificate details post rollout¶
kubectl get nodes \
-o jsonpath='{.items[*].status.addresses[?(@.type=="ExternalIP")].address}' \
-l node-role.kubernetes.io/master= > nodes
for i in `cat nodes`; do
printf "\n######\n"
ssh -o "StrictHostKeyChecking=no" -q capv@$i hostname
ssh -o "StrictHostKeyChecking=no" -q capv@$i sudo kubeadm certs check-expiration
done;
Sample Output¶
######
workload-slot35rp10-control-plane-4xgw8
[check-expiration] Reading configuration from the cluster...
[check-expiration] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
W0923 18:10:02.660438 13427 utils.go:69] The recommended value for "resolvConf" in "KubeletConfiguration" is: /run/systemd/resolve/resolv.conf; the provided value is: /run/systemd/resolve/resolv.conf
CERTIFICATE EXPIRES RESIDUAL TIME CERTIFICATE AUTHORITY EXTERNALLY MANAGED
admin.conf Sep 23, 2023 18:05 UTC 364d ca no
apiserver Sep 23, 2023 18:05 UTC 364d ca no
apiserver-etcd-client Sep 23, 2023 18:05 UTC 364d etcd-ca no
apiserver-kubelet-client Sep 23, 2023 18:05 UTC 364d ca no
controller-manager.conf Sep 23, 2023 18:05 UTC 364d ca no
etcd-healthcheck-client Sep 23, 2023 18:05 UTC 364d etcd-ca no
etcd-peer Sep 23, 2023 18:05 UTC 364d etcd-ca no
etcd-server Sep 23, 2023 18:05 UTC 364d etcd-ca no
front-proxy-client Sep 23, 2023 18:05 UTC 364d front-proxy-ca no
scheduler.conf Sep 23, 2023 18:05 UTC 364d ca no
CERTIFICATE AUTHORITY EXPIRES RESIDUAL TIME EXTERNALLY MANAGED
ca Sep 18, 2032 23:09 UTC 9y no
etcd-ca Sep 18, 2032 23:09 UTC 9y no
front-proxy-ca Sep 18, 2032 23:09 UTC 9y no
- Certificates have been refreshed/rotated back to
364
days