Skill Detail

Investigate production incidents across Kubernetes and cloud signals with HolmesGPT

Use HolmesGPT when an on-call agent needs one investigation loop that pulls alerts, logs, metrics, and infrastructure context from multiple systems and returns a root-cause path instead of forcing a human to hop across separate observability products.

Runbooks & DiagnosticsCustom Agents

Runbooks & Diagnostics Custom Agents Security Reviewed

⭐ 2.3k GitHub stars

INSTALL WITH ANY AGENT

npx skills add agentskillexchange/skills --skill investigate-production-incidents-across-kubernetes-and-cloud-signals-with-holmesgpt

Copy

Works best when you want a reusable capability, not another fragile one-off prompt.

View source Documentation

At a glance

Tools required

HolmesGPT CLI or operator deployment, one supported LLM provider, and connected observability/toolset integrations

Install & setup

Follow the HolmesGPT installation docs to deploy the CLI or Kubernetes operator, configure an LLM provider, and connect the relevant observability toolsets before running incident investigations.

Author

HolmesGPT

Publisher

Organization

Last updated

Apr 21, 2026

Quick brief

HolmesGPT is publishable because the user-facing job is specific and operational: investigate a live production incident by querying connected observability and infrastructure systems, correlate the evidence, and return a likely root-cause path with remediation direction. The upstream project explicitly frames itself as an open-source AI agent for production incident investigation, with built-in integrations for Kubernetes, Prometheus, Grafana, Datadog, cloud services, databases, ticketing systems, and more.

How it works

What this skill actually does

Invoke it instead of using the underlying products normally when the real need is cross-source incident investigation, not dashboard-by-dashboard inspection. A user reaches for HolmesGPT when an agent should gather the relevant signals, follow the evidence across systems, and explain what is probably broken, rather than manually pivot through kubectl, alerting tools, logs, traces, and cloud consoles.

The scope boundary is clear enough to keep this from collapsing into a plain product card: this is not a generic observability platform listing and not just a connector bundle. The bounded workflow is incident triage and root-cause investigation across existing telemetry sources. That is a concrete agent job to be done with a tighter operator outcome than the surrounding platforms HolmesGPT connects to.

Best fit

When to reach for it

Best when the job fits Runbooks & Diagnostics.
Works naturally with Custom Agents setups.
Requires HolmesGPT CLI or operator deployment, one supported LLM provider, and….
Installation is straightforward: Follow the HolmesGPT installation docs to deploy the CLI or Kubernetes operator, configure an LLM provider,…

Trust & provenance

Why this listing is credible

Trust status: Security Reviewed.
2.3k GitHub stars on the linked upstream source.
Last updated Apr 21, 2026.

View source ↗ Documentation ↗