Skill Detail

Deepgram Nova STT Pipeline

Real-time speech-to-text using Deepgram Nova-2 API with streaming WebSocket connections. Supports diarization, punctuation, and language detection via the Deepgram Python SDK for podcast and meeting transcription workflows.

Media & TranscriptionClaude Code
Media & Transcription Claude Code Published
Tool match: ffmpeg
INSTALL WITH ANY AGENT
npx skills add agentskillexchange/skills --skill deepgram-nova-stt-pipeline Copy
Works best when you want a reusable capability, not another fragile one-off prompt.
At a glance
Last updated
Mar 24, 2026
Quick brief

Automate speech-to-text transcription using the Deepgram Nova-2 model via their streaming WebSocket API. This skill connects to the Deepgram Python SDK (deepgram-sdk) to process audio files and live audio streams into accurate text transcripts.

How it works

What this skill actually does

Key capabilities include multi-speaker diarization for meeting recordings, automatic punctuation restoration, and real-time interim results for live captioning. The skill supports over 30 languages with automatic language detection.

Configuration options include model selection (nova-2, nova, enhanced, base), sample rate settings, encoding formats (linear16, flac, opus), and callback URLs for async processing. The pipeline handles chunked audio uploads for large files, with automatic retry logic and rate limit management against the Deepgram REST API.

Output formats include plain text, SRT subtitles, VTT captions, and structured JSON with word-level timestamps. Integrates with FFmpeg for audio preprocessing and format conversion before submission to the Deepgram endpoint.