Skill Detail

Kubernetes Pod Crash Diagnostician

Diagnoses Kubernetes pod crash loops by analyzing events, logs, and resource quotas via the Kubernetes API and kubectl debug. Correlates OOMKill signals with container memory profiles from Prometheus queries.

Runbooks & DiagnosticsClaude Code
Runbooks & Diagnostics Claude Code Security Reviewed
Tool match: kubernetes โญ 121.7k GitHub stars Apache-2.0 license
INSTALL WITH ANY AGENT
npx skills add agentskillexchange/skills --skill kubernetes-pod-crash-diagnostician Copy
Works best when you want a reusable capability, not another fragile one-off prompt.
At a glance
Last updated
Mar 24, 2026
Quick brief

The Kubernetes Pod Crash Diagnostician skill provides automated root cause analysis for pod crash loops in Kubernetes clusters. It connects to the Kubernetes API server to fetch pod events, container statuses, and recent restart history. For OOMKilled containers, it queries Prometheus via PromQL to retrieve historical memory usage patterns and container_memory_working_set_bytes metrics, identifying whether the OOM was caused by a memory leak, insufficient limits, or spike load. For CrashLoopBackOff pods, it analyzes container logs from the previous terminated instance, checks init container completion status, and validates volume mount accessibility. The skill inspects resource quotas, limit ranges, and pod disruption budgets that may constrain scheduling. Generates diagnostic reports with recommended resource limit adjustments based on P95 memory and CPU usage. Supports ephemeral debug containers via kubectl debug for live investigation. Integrates with Grafana dashboards for visual correlation.