Header Banner Image
Your
Trusted
Get Fully AWS Funded
Cloud Migration

How to Debug Kubernetes Node Issues (K3s and General Clusters)

When a node becomes NotReady or you can’t access logs from pods, it usually means something is wrong with the kubelet, network, or system resources.
This guide shows how to inspect and recover a node without SSH access, using kubectl debug.


🧭 1. Accessing the Node

Use kubectl debug to run a privileged pod directly on the node:

kubectl debug node/<node-name> -it --image=nicolaka/netshoot -- /bin/sh
chroot /host sh

You’re now inside the host OS filesystem and can inspect it as root.


🩺 2. Basic System Health

Check the core system metrics first:

hostname
df -h                 # Disk usage
free -m               # Memory
uptime                # CPU load

If disk or memory pressure is high, Kubernetes will automatically taint the node as NotReady.


⚙️ 3. Check Critical Processes
ps aux | grep k3s      # K3s server or agent should be running
ps aux | grep kubelet  # Kubelet should be part of the K3s process

If kubelet is missing, the node cannot register with the control plane.


📜 4. Retrieve Logs

K3s may log differently depending on installation method and OS. Try the following paths in order:

cat /var/lib/rancher/k3s/server/logs/k3s.log | tail -n 50
cat /var/lib/rancher/k3s/agent/logs/k3s.log | tail -n 50
journalctl -u k3s --no-pager | tail -n 50
cat /proc/$(pgrep -f "k3s server|k3s agent")/fd/1 | tail -n 50

If you get logs from /proc/.../fd/1, you’re tailing the live K3s process output — this works even without traditional log files.


🔍 5. Identify Common Root Causes

Symptom

Likely Cause

What to Check

Node NotReady

Kubelet not running

K3s logs or stdout

DiskPressure

Disk full

df -h

MemoryPressure

Out-of-memory events

`dmesg

NetworkUnavailable

CNI problem

kubectl -n kube-system get pods -o wide

Logs inaccessible

API server can’t reach kubelet

curl -k https://127.0.0.1:10250/healthz


🧰 6. Restart or Recover

If the node seems healthy after inspection:

systemctl restart k3s || service k3s restart

Then confirm:

kubectl get nodes -o wide

If it still fails, re-provision the node using your cluster manager (e.g., hetzner-k3s upgrade --config cluster.yaml).


🧹 7. Clean Up

After finishing the investigation:

exit
kubectl delete pod node-debugger-<node-name> -n default

✅ Summary
  • kubectl debug is the safest way to inspect nodes without SSH.

  • Always chroot /host sh to access real logs and system files.

  • Focus on kubelet, networking, and system pressure — these are responsible for 90% of node-level issues.