When a node becomes NotReady or you can’t access logs from pods, it usually means something is wrong with the kubelet, network, or system resources.
This guide shows how to inspect and recover a node without SSH access, using kubectl debug.
Use kubectl debug to run a privileged pod directly on the node:
kubectl debug node/<node-name> -it --image=nicolaka/netshoot -- /bin/sh
chroot /host sh
You’re now inside the host OS filesystem and can inspect it as root.
Check the core system metrics first:
hostname
df -h # Disk usage
free -m # Memory
uptime # CPU load
If disk or memory pressure is high, Kubernetes will automatically taint the node as NotReady.
ps aux | grep k3s # K3s server or agent should be running
ps aux | grep kubelet # Kubelet should be part of the K3s process
If kubelet is missing, the node cannot register with the control plane.
K3s may log differently depending on installation method and OS. Try the following paths in order:
cat /var/lib/rancher/k3s/server/logs/k3s.log | tail -n 50
cat /var/lib/rancher/k3s/agent/logs/k3s.log | tail -n 50
journalctl -u k3s --no-pager | tail -n 50
cat /proc/$(pgrep -f "k3s server|k3s agent")/fd/1 | tail -n 50
If you get logs from /proc/.../fd/1, you’re tailing the live K3s process output — this works even without traditional log files.
Symptom | Likely Cause | What to Check |
Node | Kubelet not running | K3s logs or stdout |
DiskPressure | Disk full |
|
MemoryPressure | Out-of-memory events | `dmesg |
NetworkUnavailable | CNI problem |
|
Logs inaccessible | API server can’t reach kubelet |
|
If the node seems healthy after inspection:
systemctl restart k3s || service k3s restart
Then confirm:
kubectl get nodes -o wide
If it still fails, re-provision the node using your cluster manager (e.g., hetzner-k3s upgrade --config cluster.yaml).
After finishing the investigation:
exit
kubectl delete pod node-debugger-<node-name> -n default
kubectl debugis the safest way to inspect nodes without SSH.Always
chroot /host shto access real logs and system files.Focus on kubelet, networking, and system pressure — these are responsible for 90% of node-level issues.