Skill Detail

Deepgram Real-Time Transcription Connector

Streams live audio to Deepgram's WebSocket API at wss://api.deepgram.com/v1/listen for real-time speech-to-text. Handles interim results, utterance detection, and speaker diarization via the Deepgram Node SDK.

Media & TranscriptionMCP
Media & Transcription MCP Security Reviewed
โญ 260 GitHub stars โฌ‡ 1.6M/wk npm
INSTALL WITH ANY AGENT
npx skills add agentskillexchange/skills --skill deepgram-realtime-transcription-connector Copy
Works best when you want a reusable capability, not another fragile one-off prompt.
At a glance
Tools required
Node.js
Install & setup
npm install @deepgram/sdk
Author
Deepgram
Last updated
Mar 24, 2026
Quick brief

This skill connects to Deepgram’s real-time transcription API using their WebSocket endpoint at wss://api.deepgram.com/v1/listen. It initializes a connection via the Deepgram Node SDK using createClient(apiKey) and establishes a live transcription session with client.listen.live({ model: “nova-2”, language: “en”, smart_format: true, diarize: true }). Audio is streamed in chunks using connection.send(audioBuffer) with support for raw PCM (linear16), WAV, MP3, and Opus formats. The skill handles three event types: “Results” for transcription data (both interim and final), “Metadata” for stream information, and “Error” for connection issues. Speaker diarization assigns speaker labels (Speaker 0, Speaker 1, etc.) to each word with confidence scores. The skill implements utterance end detection via the utterance_end_ms parameter and “UtteranceEnd” events for natural sentence boundary detection. Reconnection logic handles network interruptions with exponential backoff. Output formats include plain text, SRT subtitles with timestamps, and structured JSON with word-level timing. The skill also supports Deepgram’s keyword boosting via keywords=[“custom_term:2.0”] for domain-specific vocabulary.