Skill Detail

Deepgram Nova STT Pipeline

Real-time speech-to-text using Deepgram Nova-2 API with streaming WebSocket connections. Supports diarization, punctuation, and language detection via the Deepgram Python SDK for podcast and meeting transcription workflows.

Media & TranscriptionClaude Code

Media & Transcription Claude Code Published

Tool match: ffmpeg

INSTALL WITH ANY AGENT

npx skills add agentskillexchange/skills --skill deepgram-nova-stt-pipeline Copy

Works best when you want a reusable capability, not another fragile one-off prompt.

View source

At a glance

Last updated

Mar 24, 2026

Quick brief

Automate speech-to-text transcription using the Deepgram Nova-2 model via their streaming WebSocket API. This skill connects to the Deepgram Python SDK (deepgram-sdk) to process audio files and live audio streams into accurate text transcripts.

How it works

What this skill actually does

Key capabilities include multi-speaker diarization for meeting recordings, automatic punctuation restoration, and real-time interim results for live captioning. The skill supports over 30 languages with automatic language detection.

Configuration options include model selection (nova-2, nova, enhanced, base), sample rate settings, encoding formats (linear16, flac, opus), and callback URLs for async processing. The pipeline handles chunked audio uploads for large files, with automatic retry logic and rate limit management against the Deepgram REST API.

Output formats include plain text, SRT subtitles, VTT captions, and structured JSON with word-level timestamps. Integrates with FFmpeg for audio preprocessing and format conversion before submission to the Deepgram endpoint.

Best fit

When to reach for it

Best when the job fits Media & Transcription.
Works naturally with Claude Code setups.

Trust & provenance

Why this listing is credible

Built around the ffmpeg toolchain.
Trust status: Published.
Last updated Mar 24, 2026.

View source ↗