Skill Detail

OpenAI Whisper Transcription

Local speech-to-text transcription without relying on an API.

Media & TranscriptionOpenClaw

Media & Transcription OpenClaw Security Reviewed Security: Low

Tool match: whisper ⭐ 101.1k GitHub stars MIT license

INSTALL WITH ANY AGENT

npx skills add agentskillexchange/skills --skill openai-whisper-transcription Copy

Works best when you want a reusable capability, not another fragile one-off prompt.

View source Documentation

At a glance

Tools required

Python, pip, FFmpeg

Install & setup

pip install -U openai-whisper

Author

OpenAI

Publisher

Open Source Collective

Last updated

Apr 6, 2026

Quick brief

OpenAI Whisper Transcription is built around OpenAI Whisper speech recognition model. The underlying ecosystem is represented by openai/whisper (96,530+ GitHub stars). It gives an agent a more technical and reliable way to work with the tool than a thin one-line wrapper, using stable interfaces like local transcription, language detection, timestamps, subtitle formats, model sizes and preserving the operational context that matters for real tasks.

How it works

What this skill actually does

In practice, the skill gives an agent a stable interface to whisper so it can inspect state, run the right operation, and produce a result that fits into a larger engineering or operations pipeline. The original use case is clear: Local speech-to-text transcription without relying on an API. The implementation typically relies on local transcription, language detection, timestamps, subtitle formats, model sizes, with configuration passed through environment variables, connection strings, service tokens, or workspace config depending on the upstream platform.

Accesses local transcription, language detection, timestamps, subtitle formats, model sizes instead of scraping a UI, which makes runs easier to audit and retry.
Supports structured inputs and outputs so another tool, agent, or CI step can consume the result.
Can be wired into cron jobs, webhook handlers, MCP transports, or local CLI workflows depending on the skill format.
Fits into broader integration points such as audio transcription, subtitle generation, and speech pipelines.

Key integration points include audio transcription, subtitle generation, and speech pipelines. In a real environment that usually means passing credentials through env vars or app config, respecting rate limits and permission scopes, and returning structured artifacts that can be attached to tickets, pull requests, dashboards, or follow-up automations.

Best fit

When to reach for it

Best when the job fits Media & Transcription.
Works naturally with OpenClaw setups.
Requires Python, pip, FFmpeg.
Installation is straightforward: pip install -U openai-whisper

Trust & provenance

Why this listing is credible

Built around the whisper toolchain.
Trust status: Security Reviewed.
101.1k GitHub stars on the linked upstream source.
License: MIT.
Last updated Apr 6, 2026.

View source ↗ Documentation ↗