How to Debug Kubernetes Node Issues (K3s and General Clusters)

When a node becomes NotReady or you can’t access logs from pods, it usually means something is wrong with the kubelet, network, or system resources.
This guide shows how to inspect and recover a node without SSH access, using kubectl debug.

🧭 1. Accessing the Node

Use kubectl debug to run a privileged pod directly on the node:

kubectl debug node/<node-name> -it --image=nicolaka/netshoot -- /bin/sh
chroot /host sh

You’re now inside the host OS filesystem and can inspect it as root.

🩺 2. Basic System Health

Check the core system metrics first:

hostname
df -h                 # Disk usage
free -m               # Memory
uptime                # CPU load

If disk or memory pressure is high, Kubernetes will automatically taint the node as NotReady.

⚙️ 3. Check Critical Processes

ps aux | grep k3s      # K3s server or agent should be running
ps aux | grep kubelet  # Kubelet should be part of the K3s process

If kubelet is missing, the node cannot register with the control plane.

📜 4. Retrieve Logs

K3s may log differently depending on installation method and OS. Try the following paths in order:

cat /var/lib/rancher/k3s/server/logs/k3s.log | tail -n 50
cat /var/lib/rancher/k3s/agent/logs/k3s.log | tail -n 50
journalctl -u k3s --no-pager | tail -n 50
cat /proc/$(pgrep -f "k3s server|k3s agent")/fd/1 | tail -n 50

If you get logs from /proc/.../fd/1, you’re tailing the live K3s process output — this works even without traditional log files.

🔍 5. Identify Common Root Causes

Symptom	Likely Cause	What to Check
Node `NotReady`	Kubelet not running	K3s logs or stdout
DiskPressure	Disk full	`df -h`
MemoryPressure	Out-of-memory events	`dmesg
NetworkUnavailable	CNI problem	`kubectl -n kube-system get pods -o wide`
Logs inaccessible	API server can’t reach kubelet	`curl -k https://127.0.0.1:10250/healthz`

🧰 6. Restart or Recover

If the node seems healthy after inspection:

systemctl restart k3s || service k3s restart

Then confirm:

kubectl get nodes -o wide

If it still fails, re-provision the node using your cluster manager (e.g., hetzner-k3s upgrade --config cluster.yaml).

🧹 7. Clean Up

After finishing the investigation:

exit
kubectl delete pod node-debugger-<node-name> -n default

✅ Summary

kubectl debug is the safest way to inspect nodes without SSH.
Always chroot /host sh to access real logs and system files.
Focus on kubelet, networking, and system pressure — these are responsible for 90% of node-level issues.