Skill Detail

Whisper.cpp Real-Time Transcription Pipeline

Streams audio from PulseAudio or ALSA devices into whisper.cpp for real-time speech-to-text with word-level timestamps. Outputs SRT/VTT subtitles and JSON transcripts simultaneously.

Media & TranscriptionClaude Code

Media & Transcription Claude Code Security Reviewed

Tool match: whisper ⭐ 97.8k GitHub stars MIT license

INSTALL WITH ANY AGENT

npx skills add agentskillexchange/skills --skill whisper-cpp-realtime-transcription-pipeline Copy

Works best when you want a reusable capability, not another fragile one-off prompt.

View source

At a glance

Last updated

Mar 24, 2026

Quick brief

This skill configures whisper.cpp for continuous real-time transcription from live audio sources. It captures audio via PulseAudio’s parec or ALSA’s arecord utilities, pipes PCM data into whisper.cpp’s streaming mode with configurable model sizes (tiny through large-v3). The pipeline produces word-level timestamps with confidence scores, outputting simultaneously to SRT subtitle files, WebVTT for web players, and structured JSON for downstream processing. It supports Voice Activity Detection via Silero VAD to skip silence and reduce compute. Language auto-detection uses whisper.cpp’s built-in classifier, with manual override available. The skill handles microphone hot-plugging gracefully using udev rules and systemd socket activation. For multi-speaker scenarios, it integrates with pyannote-audio for speaker diarization as a post-processing step. GPU acceleration is supported via CUDA, Metal, and Vulkan backends. Configuration includes buffer size tuning, beam search parameters, and custom vocabulary via initial prompt injection.

Best fit

When to reach for it

Best when the job fits Media & Transcription.
Works naturally with Claude Code setups.

Trust & provenance

Why this listing is credible

Built around the whisper toolchain.
Trust status: Security Reviewed.
97.8k GitHub stars on the linked upstream source.
License: MIT.
Last updated Mar 24, 2026.

View source ↗