Kubernetes Cluster Doomsday

10 Januari 2020 • 3 menit untuk membaca artikel ini

Kubernetes has a ticking time bomb, that is the cluster’s certificate. Kubernetes clusters work on top of TLS and rely on PKI certificates for authentication over TLS.

The Kubernetes cluster certificates have a lifespan of one year. If the Kubernetes cluster certificate expires on the Kubernetes master, then the kubelet service will fail. Issuing a kubectl command, such as kubectl get pods or kubectl exec -it container_name bash, will result in a message similar to Unable to connect to the server: x509: certificate has expired or is not yet valid.

I already know about that, and renew our cluster cert at Mar 16, 2020 04:20 UTC

A. Nightmare is coming

But at 2020-08-09 18.26 (+0700) our cluster was down, every service was down. Try issuing kubectl command to check the cluster

$ kubectl get node -o wide
Unable to connect to the server: x509: certificate has expired or is not yet valid

Check the kubelet log

$ sudo journalctl -fu kubelet
août 09 13:46:38 KubeMaster kubelet[16214]: E0809 13:46:38.476674   16214 reflector.go:125] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: Get https://192.168.140.66:6443/api/v1/pods?fieldSelector=spec.nodeName%3Dkubemaster&limit=500&resourceVersion=0: x509: certificate has expired or is not yet valid
août 09 13:46:38 KubeMaster kubelet[16214]: E0809 13:46:38.532150   16214 kubelet.go:2248] node "kubemaster" not found

Hey..what happen, why our cluster certificates are expired, I already renew them on March 2020. Let’s check our cluster certificates:

sumar@KubeMaster:~$ sudo kubeadm alpha certs check-expiration
[sudo] Mot de passe de sumar : 
CERTIFICATE                EXPIRES                  RESIDUAL TIME   EXTERNALLY MANAGED
admin.conf                 Mar 04, 2021 04:20 UTC   218d            no      
apiserver                  Mar 04, 2021 04:20 UTC   218d            no      
apiserver-etcd-client      Mar 04, 2021 04:20 UTC   218d            no      
apiserver-kubelet-client   Mar 04, 2021 04:20 UTC   218d            no      
controller-manager.conf    Mar 04, 2021 04:20 UTC   218d            no      
etcd-healthcheck-client    Mar 04, 2021 04:20 UTC   218d            no      
etcd-peer                  Mar 04, 2021 04:20 UTC   218d            no      
etcd-server                Mar 04, 2021 04:20 UTC   218d            no      
front-proxy-client         Mar 04, 2021 04:20 UTC   218d            no      
scheduler.conf             Mar 04, 2021 04:20 UTC   218d            no      

They are not expired yet. Shit, what happen.

B. Root cause

Debugging kubernetes clusters is a pain in the ass. But at least I get what the real issue here. The root of all the problem is that the kubernetes api container does not pick our new cert. Hell yeah!

C. Wake up from a nightmare

The solution that save my ass:

  1. Regenerate all cluster Cert

    $ kubeadm alpha certs renew all
    
  2. Stop kubelet service of all nodes

    $ systemctl stop kubelet
    
  3. Stop docker service of all nodes

    $ systemctl stop docker
    
  4. Start docker service on master node

    $ systemctl start docker
    
  5. Start kubelet service on master node

    $ systemctl start kubelet
    
  6. Start docker service on every worker node

    $ systemctl start docker
    
  7. Start kubelet service on every worker node

    $ systemctl stop kubelet
    
  8. Monitor kubelet log on all node

    $ sudo journalctl -fu kubelet
    

Why not use restart instead of shutdown? I try restart command but not work, kubelet still not pick new cert.

And now Wow, the log start back to normal state. All error log related to certificates is gone. But wait, not that fast. New error was coming, our calico pod was unable to start so every pod can not communicate each other. The solution is delete the pod, so kubernetes will recreate the calico pod.

Total downtime that caused by this issue is about 1 hour 10 minutes.

Techlinuxkubeadmkubernetes

Sumarsono

System Administrator
Kembali ke atas

Install TWRP Redmi 5 Plus Dari Manjaro Linux>>

<<Convert Putty Public Key to Openssh Public Key