Publishing 100 skills taught us this: the hard part is not getting a skill live. The hard part is making it useful, trustworthy, and easy for an agent to invoke at the right moment.
This retrospective is for skill authors, marketplace curators, and teams building internal skill libraries. We are sharing the patterns that held up, the mistakes we kept seeing, and the review rules we wish we had written on day one.
Agent Skill Exchange now lists 1,978 published skills across 17 categories and 10 frameworks, according to the public ASE marketplace stats API. Bigger catalogs only help when the average skill is easier to trust, easier to discover, and harder to misuse.
Key takeaways
- Most weak skills fail because of scope, trigger language, or missing gotchas, not because the idea was bad.
- The best skills act like compact operating manuals, not long essays.
- Curation matters more after the first 100 skills than it does at the first 10.
- Useful review checklists beat taste-based editorial debates.
Table of Contents
- 1. The best skill ideas were narrower than we expected
- 2. Description fields decided discoverability
- 3. Gotchas separated marketplace-ready skills from drafts
- 4. Progressive disclosure kept skills usable
- 5. Curation rules became product features
- 6. Trust signals mattered as much as content
- The checklist we now use before publishing
- FAQ
1. The best skill ideas were narrower than we expected
Early on, we approved too many broad skills. They sounded useful because they promised a lot: deploy apps, review code, summarize incidents, manage content, handle infrastructure. In practice, many of them were trying to do three jobs at once.
That was a mistake. A skill becomes easier to invoke and easier to trust when its scope is narrow enough to be obvious. “Review JavaScript pull requests for ESLint, unsafe fetch usage, and missing tests” is a better marketplace object than “help with code quality.” The narrow version tells the model when to load it and tells the user what to expect.
This lines up with Anthropic’s public skills guidance, which treats skills as targeted capability packages rather than giant universal prompts. It also matches what we saw in strong marketplace entries such as ESLint Code Review and MSW API Mocking Skill for Frontend and Integration Tests.
What changed for us: after roughly the first 30 published skills, we stopped asking “is this useful?” and started asking “is this one job?” That one editorial change reduced overlap and made approval decisions faster.
2. Description fields decided discoverability
We wrote about this earlier in our description field guide, but publishing at scale made the lesson sharper. A surprising number of otherwise solid skills stayed invisible because the description field read like marketing copy instead of trigger logic.
A skill description is not there to impress a human reviewer. It is there to help the model decide when the skill should load. If the description says “powerful workflow for modern teams,” the skill will lose to any competing skill with plain, direct trigger language. If the description mentions real requests, real error strings, and clear exclusions, invocation gets much better.
---
name: review-react-fetch
description: >
Use when reviewing React or Next.js code that fetches remote data.
Triggers on: "loading state bug", "fetch in useEffect", "stale query",
"race condition", "SWR vs React Query", "missing abort controller".
NOT for: backend API design or database schema review.
---
That kind of description does three jobs in under 60 words: it names the surface area, includes likely request language, and blocks bad activations.
Thariq’s March 17 thread about how the Claude Code team uses skills reached 2.7 million views for a reason. One of the most important points in that thread was that the description field is for the model, not for humans. After reviewing 100 marketplace entries, we think that advice should be printed on the wall.
3. Gotchas separated marketplace-ready skills from drafts
If we had to pick one review heuristic that predicts whether a skill will hold up in real usage, it would be the gotchas section. Weak skills tell an agent what to do. Strong skills tell an agent where the obvious path breaks.
We kept seeing the same failure mode: a skill would include a nice workflow and clean formatting, but it would skip the three or four ugly edge cases that actually matter. That almost always meant the skill was written from theory rather than from repeated use.
The strongest authors gave us details like these:
- Which API field looks mandatory in docs but returns
nullin production. - Which CLI exit code is non-zero when the operation succeeded.
- Which browser automation step fails unless the page reaches a specific load state.
- Which deployment command must run from a repo root instead of a package directory.
That is why posts like The Gotchas Section: The Single Most Valuable Part of Any Skill and Progressive Disclosure: Why the Best Skills Split Content Across Multiple Files resonated with readers. They describe the same thing we learned from marketplace review: the most valuable skill content is usually the least glamorous line in the file.
4. Progressive disclosure kept skills usable
Another pattern showed up once entries got more ambitious. Authors would keep piling detail into one massive SKILL.md file. The result was comprehensive, but not effective. Long skills are not automatically better skills. They often bury the decision-critical parts under pages of background material.
We now prefer a smaller main file with support files for reference material, examples, or scripts. Anthropic’s skills docs make the same point: skills work well because long reference material loads only when it is needed. That is a real advantage over dumping everything into a single always-on instruction block.
my-skill/
โโโ SKILL.md
โโโ references/
โ โโโ edge-cases.md
โ โโโ api-examples.md
โโโ scripts/
โ โโโ validate.sh
โโโ examples/
โโโ sample-task.md
Our internal rule of thumb is simple: if a skill starts reading like a mini-book, split it. Keep the main file focused on when to use the skill, how to approach the job, and what to avoid. Put deep references where the model can pull them on demand.
This was especially clear in skills tied to tools with large option surfaces, like Playwright Cross-Browser Testing and Automation Framework and Cheerio HTML and XML Parsing Library for Node.js Extraction Workflows. Those skills are only manageable when the critical playbook stays separate from the long tail of supporting detail.
5. Curation rules became product features
At the start, our review process felt editorial. By the time we had published 100 skills, it felt more like product design. Review rules changed what users could trust in the catalog.
Three curation rules mattered the most:
- No redundant near-duplicates. If two skills solve the same job with the same tooling and audience, one of them should usually be rejected or merged.
- No generic claims without operational detail. “Automates your workflow” is not useful evidence.
- No publish without a failure-aware review pass. We now look for missing assumptions, not just typos.
Those rules improved browsing as much as they improved content quality. Once a marketplace has dozens of entries, users stop judging individual skills in isolation. They judge whether the directory wastes their time. Good curation lowers that tax.
| Review question | Weak answer | Strong answer |
|---|---|---|
| What job does this skill own? | “General automation” | “Normalize phone numbers to E.164 before CRM import” |
| When should it trigger? | Vague capability statement | Concrete requests, error strings, exclusions |
| Why trust it? | No sources or tests | Real tool match, examples, edge cases, references |
You can see that stronger pattern in recent entries like Normalize international phone numbers into E.164 before CRM imports or messaging workflows. The title is specific, the use case is narrow, and the operational value is obvious before you open the page.
6. Trust signals mattered as much as content
We went into this thinking that better writing would do most of the work. It did not. Marketplace trust also comes from visible signals around the skill.
On ASE, users care about whether a skill maps cleanly to a real tool, whether the source is visible, whether the framework compatibility is clear, and whether the entry looks maintained. Those signals help a user decide in under 10 seconds whether a skill is worth opening.
That is one reason the open Agent Skills standard matters. It gives authors a common shape for packaging capabilities across ecosystems. It is also why Anthropic’s public skills documentation and the Skill Creator testing workflow are worth reading, even if you never submit to ASE. Better standards make better marketplaces possible.
The checklist we now use before publishing
If you are building your own internal library or contributing to ASE, this is the copy-paste review checklist we wish we had at the beginning:
Skill review checklist
[ ] The skill owns one clear job
[ ] The description includes trigger phrases and exclusions
[ ] The body includes real gotchas from actual failures
[ ] Long references live outside SKILL.md when possible
[ ] Hardcoded team paths, secrets, and personal assumptions are removed
[ ] The title and summary make the tool match obvious
[ ] At least one example task is included
[ ] The skill links to docs or authoritative references
[ ] The scope does not duplicate an existing skill already in the catalog
[ ] A reviewer can explain when NOT to use the skill in one sentence
If a reviewer cannot explain when not to use a skill, the scope is usually still too fuzzy.
Frequently Asked Questions
How many skills should a new marketplace publish before tightening curation?
Immediately. You do not need to wait for scale. We felt the pain more sharply after 100 skills, but the duplicate-content problem starts much earlier, usually within the first 20 to 30 entries.
What is the most common reason a skill fails review?
The most common reason is unclear scope. Missing gotchas and weak descriptions are close behind, but scope problems tend to create the other two issues.
Should every skill include code or command examples?
Not every skill needs code, but most technical skills need at least one concrete example. A short snippet often communicates constraints faster than a paragraph of explanation.
Closing thought
Publishing 100 skills did not make us more impressed by volume. It made us more suspicious of it. The skills that last are the ones that save time without adding ambiguity.
If you are publishing to Agent Skill Exchange, start smaller than feels comfortable, write the gotchas before the conclusion, and treat curation as part of the product. That is the real lesson from publishing 100 skills, and it is the standard we are holding ourselves to as the catalog keeps growing.