If your skill only works when its SKILL.md is short, you have a writing problem. If it only works when its SKILL.md is huge, you have an architecture problem.
Progressive disclosure fixes both. Instead of stuffing every instruction, edge case, API note, and example into one file, you keep the core playbook lean and move deep detail into supporting files that the agent can read only when needed. Anthropic’s Claude Code documentation explicitly supports skills as directories with supporting files, and Thariq’s March 2026 breakdown of how the Claude Code team uses skills called out the file system pattern directly: split content across references/, scripts/, and examples/ so the model loads what matters now, not everything all at once.
This article is for skill authors, tool builders, and teams maintaining internal agent workflows. You’ll see when progressive disclosure helps, how to structure a skill directory, what belongs in each file, and which mistakes make multi-file skills harder to use instead of easier.
Key takeaways
- Keep
SKILL.mdfocused on triggers, workflow, constraints, and gotchas. - Move long references, command docs, and data tables into
references/. - Put repeatable, deterministic actions into
scripts/. - Store task-shaped examples in
examples/, not inside the main prompt. - Use progressive disclosure when your skill is accurate but bloated, reused by a team, or covers multiple branches of work.
Table of Contents
- What progressive disclosure means in agent skills
- Why single-file skills break at scale
- A practical skill directory structure
- What belongs in SKILL.md, references, scripts, and examples
- A complete example
- Common mistakes
- A pre-publish checklist
- FAQ
What progressive disclosure means in agent skills
Progressive disclosure means the agent sees information in layers. The first layer is the smallest useful set of instructions: when to use the skill, what outcome to aim for, what constraints matter, and which failure modes to avoid. The second layer is optional detail: API docs, long command references, migration notes, schemas, sample payloads, and reusable scripts. The skill points to those files, but it does not dump them into context on every activation.
That sounds simple, but it changes how a skill behaves. A lean skill is easier for the model to activate correctly, easier to maintain, and less likely to bury the important bit under 200 lines of reference material. This matters because context is not free. Even on large-context models, every extra block of irrelevant text competes with the user’s actual task, the code under review, tool outputs, and the model’s own working memory.
Claude Code’s skills documentation describes skills as directories, not just prompt files, and the Agent Skills standard makes that portability useful across tools. In practice, the best marketplace skills on AgentSkillExchange already follow this pattern because it scales better than the one-file approach.
Why single-file skills break at scale
Single-file skills are fine for narrow jobs. They break when one file tries to do four things at once: explain when to trigger, define a workflow, hold a full technical reference, and include worked examples. Once that happens, three problems show up.
- Activation gets worse. Long, messy skills hide the actual trigger description. The model has more text to sort through before it can decide whether the skill fits.
- Maintenance gets worse. Updating one API endpoint means editing a giant prompt instead of swapping one reference file.
- Execution gets worse. The most important instructions compete with background noise. The agent may miss the one gotcha that mattered because it was buried below five examples and a giant error code table.
Think about a deployment skill. The core logic might be only 120 lines: inspect the repo, confirm environment, run validated deployment steps, verify health, and report outcome. But the supporting material can easily reach 2,000 lines: Helm notes, known staging caveats, rollback rules, environment mappings, and examples. Putting all of that inside SKILL.md does not make the skill smarter. It makes the signal-to-noise ratio worse.
That is why Thariq emphasized the file-system approach in his widely shared skill guidance, and it is why Anthropic’s own examples treat supporting files as normal, not exceptional.
A practical skill directory structure
Here is a structure that works well for most production-ready skills:
my-skill/
โโโ SKILL.md
โโโ config.json
โโโ references/
โ โโโ api-contract.md
โ โโโ known-failures.md
โ โโโ deployment-matrix.md
โ โโโ troubleshooting.md
โโโ scripts/
โ โโโ validate.sh
โ โโโ deploy.sh
โ โโโ rollback.sh
โโโ examples/
โโโ happy-path.md
โโโ failed-deploy.md
โโโ emergency-rollback.md
This layout is not about aesthetics. It separates judgment from reference from execution.
SKILL.mdholds the playbook.references/holds material the agent may need to read on demand.scripts/holds actions you want performed the same way every time.examples/holds concrete task patterns the agent can borrow from when the request matches.
If your skill interacts with a framework-specific workflow, a good companion piece is our guide to building your first agent skill. If you are still tightening trigger quality, also read why your SKILL.md description field is more important than you think.
What belongs in SKILL.md, references, scripts, and examples
1. SKILL.md: the minimum viable playbook
SKILL.md should answer five questions fast:
- When should this skill activate?
- What outcome is it trying to produce?
- What sequence or workflow usually works?
- What gotchas repeatedly cause failures?
- Which supporting files should the agent read for deeper detail?
That means SKILL.md should usually include frontmatter, a concise workflow, a clear scope boundary, and a high-signal gotchas section. It should not include 150 error codes, giant API payload catalogs, or repeated examples for every edge case.
2. references/: high-detail material with low default relevance
Use references/ for content that matters sometimes, not always. Typical examples include:
- API endpoint details and request schemas
- CLI command flags and environment-specific mappings
- Error code catalogs
- Migration notes between versions
- Compliance rules, escalation ladders, or incident severity definitions
The best reference files have names the agent can reason about. payment-webhook-gotchas.md is better than notes2.md. deployment-matrix.md is better than misc.md.
3. scripts/: deterministic operations
If a step must be executed precisely, write a script. Good candidates include environment validation, artifact packaging, deploy commands, rollbacks, report generation, and data normalization. Scripts reduce prompt ambiguity and make critical paths testable.
## Deployment workflow
1. Read `references/deployment-matrix.md` if the environment is unclear.
2. Run `scripts/validate.sh <env>` before deployment.
3. Deploy with `scripts/deploy.sh <env>`.
4. If health checks fail for more than 120 seconds, run `scripts/rollback.sh <env>`.
That short block gives the agent structure without forcing you to describe every shell flag in the main skill file.
4. examples/: shaped experience
Examples help when the task is nuanced, especially for reviews, analysis, or content generation. Put one example per file and make the filename specific. A skill author debugging a failed OAuth flow should not have to scan six unrelated examples in one document.
Examples work best when they show decision points, not just final output. Show what signals caused the agent to choose path A over path B. That transfers much better than a polished final answer with no reasoning trace.
A complete example: refactoring a blog-writing skill
Suppose you maintain a content skill that writes SEO articles for your team. Version one often turns into a monster file: tone rules, keyword placement rules, title formulas, CMS publishing notes, category mappings, internal links, HTML templates, and examples. It works until the file becomes too bulky to be pleasant or reliable.
A better split looks like this:
seo-blog-writer/
โโโ SKILL.md
โโโ references/
โ โโโ title-formulas.md
โ โโโ internal-linking-rules.md
โ โโโ wordpress-publishing-checklist.md
โ โโโ category-mapping.md
โโโ scripts/
โ โโโ sanitize-html.py
โโโ examples/
โโโ tutorial-post.md
โโโ comparison-post.md
Now SKILL.md stays focused: identify search intent, produce the direct answer early, include a table of contents, use links naturally, and read the category map only if publication is requested. The WordPress checklist stays out of the prompt until needed. The HTML sanitizer becomes a script. The comparison-post example is only loaded when the article format matches that shape.
That is progressive disclosure doing real work. The skill becomes easier to activate, easier to test, and easier to adapt when your editorial process changes.
Common mistakes that ruin progressive disclosure
- Creating too many tiny files. If one concept is split across eight fragments, the agent has to reassemble your documentation every time.
- Moving critical gotchas out of SKILL.md. Known failure modes belong in the core playbook, not buried in a side file.
- Using vague filenames. Agents reason from names. So do humans.
- Duplicating rules across files. Drift appears fast when the same instruction exists in three places.
- Treating scripts as optional decoration. If a task must be consistent, encode it.
One practical rule: if a detail is required in more than 80% of invocations, keep it in SKILL.md. If it is needed only for certain branches, move it out. That is not a law, but it is a useful threshold.
The progressive disclosure checklist before you publish
- Does
SKILL.mdexplain triggers, outcomes, workflow, and gotchas clearly? - Are low-frequency references moved into
references/? - Are repeatable operations implemented in
scripts/? - Do example files match real user task shapes?
- Are filenames descriptive enough that an agent can choose them correctly?
- Can a teammate update one reference file without rewriting the whole skill?
- Would a new user understand which file is the source of truth for each rule?
If you can answer yes to those seven questions, your skill is probably in good shape. If not, the fix is usually structural, not literary.
Frequently Asked Questions
When should I split a skill into multiple files?
Split a skill when the core instructions are being crowded out by reference material, examples, or edge-case documentation. If the main file is getting harder to scan than the task itself, it is time.
What should stay in SKILL.md no matter what?
Keep triggers, workflow, scope boundaries, and high-signal gotchas in SKILL.md. Those are the instructions the agent needs on almost every activation.
Are examples better than scripts?
No. They solve different problems. Use examples to show task shape and decision patterns. Use scripts when execution must be exact, repeatable, and testable.
Does progressive disclosure only matter for Claude Code?
No. It is useful anywhere skills follow the Agent Skills pattern. The idea is tool-agnostic: keep the core playbook lean and load detail only when necessary.
Conclusion
Progressive disclosure is not prompt polish. It is skill architecture. The best skills split content across multiple files because they respect how agents actually work: they need a sharp playbook first, then targeted detail when the task calls for it.
If you are publishing on AgentSkillExchange, this is one of the easiest ways to make a skill more reliable without making it more rigid. Start by cutting the main file down to its true core. Move the heavy reference material out. Turn repeated shell steps into scripts. Keep examples separate and task-shaped.
That gives you a skill that is easier to trigger, easier to maintain, and more likely to succeed in real use.
Next step: audit one of your existing skills today. If the best instruction is buried halfway down the file, you already know what to do.
Sources: Claude Code skills documentation, Thariq’s March 2026 thread on how the Claude Code team uses skills, Agent Skills standard.