Repairing Bosh Created Persistent Disk Of TKGI Worker Nodes

TKGI worker nodes are bosh deployed and have atleast three disks attached to it if you are not using Kubernetes persistent volumes. These three disks are

Default stemcell disk - mounted as the root partition, usually 3 GB in size
Ephemeral disk - this is where all the logs and bosh jobs data get pushed on VM creation. These are mounted as /var/vcap/data
Persistent disks - these are attached to the VM to store data that needs to be available across VM recreates. These are mounted as /var/vcap/store

There are times when this persistent disk can be corrupted due to underlying IaaS or filesystem issues. You can follow the steps mentioned here to recover these disks.

Info

If an ephemeral disk is corrupted recovery will be faster and better using bosh recreate or bosh cck

Identify the node to run fsck¶

In this example node with IP - 10.20.0.5

kubectl get nodes -o wide

NAME                                   STATUS   ROLES    AGE   VERSION            INTERNAL-IP   EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION       CONTAINER-RUNTIME
011704a1-5f0f-4cb9-bd91-f9ad7aec17e5   Ready    <none>   20h   v1.23.7+vmware.1   10.20.0.5     10.20.0.5     Ubuntu 16.04.7 LTS   4.15.0-191-generic   containerd://1.6.4
8334e164-8e9b-4ffb-9c89-bfe015e094a8   Ready    <none>   20h   v1.23.7+vmware.1   10.20.0.4     10.20.0.4     Ubuntu 16.04.7 LTS   4.15.0-191-generic   containerd://1.6.4
c649ec99-bb3a-4049-9c57-1751f6de271e   Ready    <none>   21h   v1.23.7+vmware.1   10.20.0.3     10.20.0.3     Ubuntu 16.04.7 LTS   4.15.0-191-generic   containerd://1.6.4

Identify the bosh VM corresponding to that node¶

bosh vms -d service-instance_77e44aad-1a76-4980-8d4e-43d7c273d167 | grep 10.20.0.5

worker/fcd09dc3-9e7a-4528-8015-22620b553f27 running az 10.20.0.5 vm-c2b8073f-949d-4891-b420-36769ecdee60 medium.disk true bosh-vsphere-esxi-ubuntu-xenial-go_agent/621.265

Drain the node¶

# Other drain options maybe needed if drain fails
kubectl drain 011704a1-5f0f-4cb9-bd91-f9ad7aec17e5 --ignore-daemonsets

node/011704a1-5f0f-4cb9-bd91-f9ad7aec17e5 cordoned
WARNING: ignoring DaemonSet-managed Pods: pks-system/fluent-bit-7rg24, pks-system/telegraf-xjsx4
evicting pod kube-system/coredns-67bd78c556-9vwfd
pod/coredns-67bd78c556-9vwfd evicted
node/011704a1-5f0f-4cb9-bd91-f9ad7aec17e5 drained

# Make sure scheduling is disabled
kubectl get nodes

NAME                                   STATUS                     ROLES    AGE   VERSION
011704a1-5f0f-4cb9-bd91-f9ad7aec17e5   Ready,SchedulingDisabled   <none>   20h   v1.23.7+vmware.1
8334e164-8e9b-4ffb-9c89-bfe015e094a8   Ready                      <none>   20h   v1.23.7+vmware.1
c649ec99-bb3a-4049-9c57-1751f6de271e   Ready                      <none>   21h   v1.23.7+vmware.1

monit stop all¶

# Turn off cck/resurrection
bosh update-resurrection off -d service-instance_77e44aad-1a76-4980-8d4e-43d7c273d167

bosh -d service-instance_77e44aad-1a76-4980-8d4e-43d7c273d167 ssh worker/fcd09dc3-9e7a-4528-8015-22620b553f27

# Steps on worker node
sudo su -
monit stop all

# To confirm everything stopped
monit summary

Identify the mount point¶

Bosh persistent disks are mounted as /var/vcap/store, before repairing we must identify the filesystem path
From the output below /var/vcap/store is leveraging /dev/sdc1

df -h

Filesystem      Size  Used Avail Use% Mounted on
<------ Truncated Output ------>
/dev/sda1       2.9G  1.4G  1.4G  52% /
/dev/sdb1        32G  3.5G   27G  12% /var/vcap/data
tmpfs            16M  4.0K   16M   1% /var/vcap/data/sys/run
/dev/sdc1        50G  2.1G   45G   5% /var/vcap/store
<------ Truncated Output ------>

Unmount¶

Before running repair the directory needs to be unmounted - umount /var/vcap/store
If umount fails because device is busy identify which processes have blocked the operation using
fuser -m -u -v /dev/sdc1 or
fuser -m -u -v /var/vcap/store
These services will need to be stopped and processes that are accessing this will need to be terminated

Run fsck¶

fsck /dev/sdc1

fsck from util-linux 2.27.1
e2fsck 1.42.13 (17-May-2015)
/dev/sdc1: clean, 12599/3276800 files, 794069/13106688 blocks

Remount Disk¶

mount /dev/sdc1 /var/vcap/store

You can confirm mount is successful using

mount | grep sdc

/dev/sdc1 on /var/vcap/store type ext4 (rw,relatime,data=ordered)

Start all the processes¶

As part of process stop and start, kubelet has also restarted which should bring the nodes out of SchedulingDisabled state

monit start all