The 9 Categories of Agent Skills: A Complete Taxonomy

When Anthropic engineer Thariq published his thread on how the Claude Code team uses skills internally, one section stood out: a clean, nine-category taxonomy for organizing skills. It’s the same framework Anthropic uses to build and evaluate their own tools.

Understanding this taxonomy matters whether you’re building skills or installing them. It tells you what a skill should and shouldn’t do, how to scope new skills appropriately, and where the real gaps in the ecosystem are. This post walks through all nine categories with examples and links to live skills in each.

The Complete Taxonomy

Anthropic’s nine categories cover distinct types of work agents do. Each solves a different problem, and a well-designed skill stays firmly within one of them rather than trying to span multiple.

Category 1: Library & API Reference

What it is: Skills that teach Claude how to correctly use specific libraries, CLIs, and SDKs.

This is the most underrated category. LLM training data goes stale fast — an API that existed two years ago may have completely different method signatures, deprecated endpoints, or new authentication flows today. A Library & API Reference skill bridges that gap without waiting for the next training run.

The key insight from Thariq: Claude doesn’t need you to explain what a library does — it can read the README. What it needs is the non-obvious stuff: which version deprecations break existing code, what the SDK does that the docs don’t mention, which method works differently from its neighbor in ways that only surface in production.

## Stripe SDK — Key Differences from Docs

## Gotchas
- `PaymentIntent.capture_method` defaults to `automatic` since v8.
  The docs still show `manual` as an example — don't copy-paste.
- `customer.list()` is paginated but not obvious: you need
  `stripe.customers.list({ limit: 100, starting_after: lastId })`
- Metadata keys are case-sensitive. "Email" and "email" are different keys.

When to build one: Your team uses a specific SDK version or internal library that Claude gets wrong. Or you’re working with a newer API that post-dates Claude’s training.

Browse Library & API Reference skills on ASE →

Category 2: Product Verification

What it is: Skills that teach Claude how to test and verify things using external tools — typically Playwright, tmux, or similar environments.

Verification skills are about closing the loop. Claude writes code or makes a change, and then it needs to check that the change worked. Without a verification skill, agents tend to either skip testing or write naive assertions that miss the point.

A Playwright skill isn’t just “here’s how to use Playwright.” It’s “here’s our specific test setup, our selectors, the anti-patterns that cause flaky tests in our codebase, and how to handle our authentication flow.” That specificity is what makes it worth creating.

## Product Verification — Browser Testing

## Test Execution
Always run tests in headed mode first: `npx playwright test --headed`
This surfaces visual regressions that headless mode hides.

## Gotchas
- Our auth redirects to SSO. Tests must use `storageState` from
  `playwright/.auth/user.json` — don't re-authenticate in each test.
- The date picker is a custom component (not a native input).
  Use `page.click('.date-picker-trigger')` then `page.fill('.date-input', ...)`

When to build one: Your team has a specific testing stack that Claude gets confused by. Particularly useful for non-standard UI frameworks and complex auth flows.

Category 3: Data Fetching & Analysis

What it is: Skills that connect Claude to your data and monitoring stacks.

This category covers everything from pulling metrics out of Datadog to querying internal analytics systems. The value isn’t in teaching Claude that Datadog exists — it’s in encoding your specific dashboards, alert thresholds, and metric naming conventions so Claude doesn’t have to guess.

## Data Fetching — Internal Analytics

## Available Data Sources
- Production DB: read-only replica at `$DB_READ_HOST:5432` (use this, never primary)
- Metrics API: `https://metrics.internal/v2/query` — docs at `/docs/metrics-api`
- Event stream: use `analyze_events.py` in scripts/ — do not query raw Kafka

## Gotchas
- All timestamps in the metrics API are UTC. Our dashboards display in PT.
  Always convert before comparing to user-reported times.
- The `session_duration` column includes server-side timeouts as sessions.
  Filter `WHERE session_duration < 7200` for real user sessions.

When to build one: Whenever Claude needs to pull data from a specific internal source and you want consistent, correct queries instead of hallucinated ones.

Category 4: Business Process & Team Automation

What it is: Skills that encode your team's repetitive workflows — the things that happen the same way every time but require just enough judgment that a script alone won't cut it.

This is where institutional knowledge lives. How does your team handle sprint planning? What's the standard process for a vendor contract review? What needs to happen before a feature gets marked ready for QA? These processes exist in wikis and people's heads. A well-written skill puts them where Claude can actually use them.

The distinction between a good Category 4 skill and a bad one: good ones encode judgment criteria, not just steps. "If the estimate is over 10 points and touches the payments module, flag for senior review" is more valuable than "create a Jira ticket."

When to build one: Any recurring workflow where the steps are well-understood but require contextual judgment to execute correctly.

Category 5: Code Scaffolding & Templates

What it is: Skills that generate framework boilerplate for your specific setup.

Generic scaffolding exists everywhere — create-react-app, rails new, etc. Category 5 skills are for your scaffolding: your team's folder structure, your naming conventions, your standard packages pre-installed, your linting config included from day one.

## Code Scaffolding — Internal React Component

## Component Structure
All new components should follow:
src/components/[ComponentName]/
├── index.tsx        # Public export only
├── [ComponentName].tsx   # Main component
├── [ComponentName].test.tsx
├── [ComponentName].stories.tsx
└── styles.module.css

## Gotchas
- Always use named exports from index.tsx (not default exports)
- Props interface goes in the component file, not a separate types/ file
  (we had a bad experience with circular imports from separate type files)
- Don't add barrel exports — our build tool handles tree-shaking fine without them

When to build one: When your team has settled on patterns that are specific enough to encode but general enough to reuse.

Category 6: Code Quality & Review

What it is: Skills for enforcing code quality standards, running static analysis, and performing code review.

This category is about standards enforcement, not just feedback. A generic code review asks "is this code good?" A Category 6 skill asks "does this code follow our specific conventions, avoid our known anti-patterns, and meet our documented quality bar?"

The best skills in this category go beyond linting. They know which patterns have caused production issues before, which shortcuts are acceptable in tests but not in production code, and how to evaluate readability against your team's specific style.

Browse Code Quality skills on ASE → — or check out our earlier post on gotchas sections which directly applies to this category.

Category 7: CI/CD & Deployment

What it is: Skills for fetching pipeline status, triggering builds, and executing deployments.

Deployment skills are inherently about guardrails. Claude can push code, but it needs to know when not to. What checks must pass before a production deploy is safe? What's the rollback procedure if something fails? Who needs to be notified?

The gotchas section is critical here — this is the category where mistakes are most expensive. A good deployment skill encodes the institutional knowledge about what has gone wrong before and why:

## Deployment — Production Release

## Pre-deploy Checklist
Before any production deploy:
1. Confirm all tests green on main (check CI status, not just local)
2. Database migrations must run BEFORE the new code is deployed
3. Check #deployments in Slack for any ongoing incidents

## Gotchas
- NEVER deploy on Fridays after 3pm PT. Oncall coverage gaps.
- Our CDN has a 15-minute propagation delay. Users may see old JS for up to 15min.
  Warn users in #deploys if the change is user-facing.
- The `deploy:production` command does NOT run migrations automatically.
  Run `npm run db:migrate:production` first, then deploy.

When to build one: When your deployment process has enough specific steps or gotchas that Claude makes mistakes without explicit guidance.

Category 8: Runbooks

What it is: Skills that encode symptom → investigation → structured report workflows for operational issues.

A runbook skill is what happens when you turn your incident response docs into something an agent can actually execute. The pattern is: describe the symptom, the skill guides the investigation, and the output is a structured report with findings and recommended actions.

Thariq's taxonomy puts runbooks in a distinct category for good reason. Unlike other skills, runbooks are explicitly designed for high-stress, time-sensitive situations. The quality bar is different: precision over comprehensiveness, clear escalation paths, explicit "stop here and call a human" conditions.

## Runbook — High Latency Investigation

## Symptom
API p99 latency > 2000ms for more than 5 minutes

## Investigation Steps
1. Check database connection pool: `scripts/check_db_pool.sh`
2. Review recent deployments (last 2 hours): `scripts/recent_deploys.sh`
3. Check for upstream dependency issues: `scripts/check_dependencies.sh`
4. Review error rate alongside latency (correlated = different cause than isolated)

## Report Format
Produce a structured report with:
- Root cause hypothesis (ranked by likelihood)
- Evidence supporting each hypothesis
- Recommended immediate action
- Escalation path if immediate action doesn't resolve within 15 minutes

## STOP CONDITION
If root cause is unclear after 10 minutes of investigation, escalate to oncall immediately.
Do NOT continue investigating at the expense of time-to-resolution.

When to build one: For any recurring operational scenario where the investigation path is known but complex enough that following it correctly under pressure is hard.

Category 9: Infrastructure Operations

What it is: Skills for routine maintenance and operational procedures — the things that need to happen regularly but don't require deep incident response.

This category covers the operational work that doesn't rise to the level of a runbook but still benefits from explicit guidance: certificate rotation procedures, database backup verification, cleanup jobs, capacity planning checks. These are tasks where "close enough" isn't good enough, and the exact steps matter.

## Infrastructure Operations — Certificate Renewal

## Gotchas
- Our wildcard cert covers *.example.com but NOT example.com (bare domain).
  The bare domain cert is managed separately — check both when approaching expiry.
- Renewal requires 48 hours to propagate through our CDN. Plan accordingly.
- The cert renewal script writes to /etc/ssl/certs/ — needs sudo.
  If running as the deploy user, use the wrapper: `scripts/renew_cert.sh`
- DNS validation tokens expire after 24 hours. Complete the validation the same day.

When to build one: For any infrastructure procedure that Claude gets wrong without explicit guidance, or where mistakes have significant operational impact.

Where Skills Get Scoped Wrong

Looking at this taxonomy, one pattern stands out: the most common skill quality problem is misclassification. Skills that try to do too much usually span categories in ways that dilute their value.

A skill that covers "everything related to our AWS setup" is trying to be Category 3 (data fetching from CloudWatch), Category 7 (deployments), and Category 9 (infrastructure operations) simultaneously. That's three distinct skills compressed into one, and each of them will be weaker for it.

The taxonomy is useful as a scoping tool before you start writing: Which one of these nine categories is this skill primarily about? If the answer is "two or three of them," consider splitting.

The Current State of the Ecosystem

Not all nine categories are equally well-represented in the current skills marketplace. Based on the 150+ skills currently live on AgentSkillExchange, some patterns are clear:

Well-populated: Code Quality & Review, Data Fetching, CI/CD & Deployment (MCP-based)
Growing: Library & API Reference, Business Process Automation
Underrepresented: Product Verification, Runbooks

Runbooks in particular are an opportunity. They're one of the most valuable skill types for operational teams, but they require more domain knowledge than generic skills — which means the community tends to build fewer of them. If you've been through incident response at a company with mature operational practices, your team's runbooks converted into skills would be genuinely useful.

Building with the Taxonomy in Mind

Practical application: before writing any skill, pick its category. Then write the description field with that category in mind — not just "what does this skill do" but "what specific signals should Claude recognize to activate it?"

A Library & API Reference skill for the Stripe SDK should trigger on phrases like "implement Stripe checkout," "webhook signature verification," or "stripe 401 error" — not generic phrases like "write payment code."

A Runbook skill for database connection issues should trigger on "database connections exhausted," "pg: FATAL: remaining connection slots are reserved," or "DB pool timeout" — specific error messages that reliably indicate the symptom the skill addresses.

Specificity in the description field is how skills get activated. The taxonomy tells you what kind of specificity to optimize for.

What's Next

Over the next few days, we'll be publishing deeper dives into specific categories. Up next: a step-by-step tutorial for building your first skill from scratch, covering the full SKILL.md structure and publishing to AgentSkillExchange.

The full reference documentation for skill creation is at Anthropic's official docs. If you want a structured introduction, the Skilljar course covers the fundamentals. And if you've built a skill worth sharing — submit it to ASE.