others-how to solve 'node unavailable, kubelet stopped posting node status' when using rancher
Problem
When we use rancher, sometimes, the worker node stopped working, we get a warning like this:
Unavailable
kubelet stopped posting node status
Environment
- Docker: Server Version: 19.03.13
- Rancher 2.x
Debug
You can debug the node status by running this command:
kubectl describe nodes
Then check the kubelet logs in the node:
journalctl -u kubelet
Solution
Solution #1: Restart docker/kubelet service
You can try to restart the docker service in the not working node:
In centos:
service docker restart
Or ubuntu:
systemctl restart docker
systemctl restart kubelet
Solution #2: Reboot the node
If you have the root permission and the server is ready to reboot, then you can do this:
reboot
Solution #3: Recreate the cluster
You can follow this guide to create the cluster again.
Solution #4: Remove and then re-add the node
- First remove the node from the cluster
- Second add the node to the cluster again or do the etcd snapshot restore by following this guide.
Solution #5: Close the swap memory in the node
You can follow this guide or just execute the command as follows:
swapoff -a
Solution #6: Re-enable the ip forwarding of docker
Dockerd enables ip forwarding (sysctl net.ipv4.ip_forward) when it starts. But if you do service network restart, it will disable ip forwarding while stopping networking. You need to re-enable it.
You can verify the ip_forward status by running:
docker info|grep WARNING
If you got this:
WARNING: IPv4 forwarding is disabled
Then you should re-enable the ip forwarding temporarily:
sudo sysctl -w net.ipv4.ip_forward=1
Or permanently:
echo "net.ipv4.ip_forward = 1" | sudo tee -a /etc/sysctl.conf