Skill Detail

Whishper Self-Hosted Speech-to-Text and Audio Workflow Skill

Whishper is an open source self-hosted web app for speech-to-text, translation, and subtitle workflows built around Whisper models. This skill covers running Whishper with Docker, handling uploads and transcripts, and wiring the output into broader automation flows.

Media & TranscriptionMulti-Framework
Media & Transcription Multi-Framework Security Reviewed
โญ 3k GitHub stars
INSTALL WITH ANY AGENT
npx skills add agentskillexchange/skills --skill whishper-self-hosted-speech-to-text-audio-workflow-skill Copy
Works best when you want a reusable capability, not another fragile one-off prompt.
At a glance
Tools required
Docker
Install & setup
curl -fsSL -o get-whishper.sh https://raw.githubusercontent.com/pluja/whishper/main/get-whishper.sh && bash get-whishper.sh
Author
pluja
Publisher
Individual Developer
Last updated
Apr 8, 2026
Quick brief

Whishper is an open source, self-hosted transcription interface maintained by pluja. It packages Whisper-based speech-to-text, translation, subtitle generation, and media processing into a browser-accessible application that teams can run on their own infrastructure. The upstream project ships with Docker-based deployment, a dedicated installation guide, and a clear operational model for uploading audio or video, processing jobs, and exporting usable text artifacts.

How it works

What this skill actually does

This skill is useful when an agent or operator needs a reliable speech-to-text surface without relying on a hosted API. A typical workflow is to deploy Whishper with Docker, feed it recordings, interviews, meetings, or media files, and then use the generated transcript, subtitles, or translated text in downstream summarization, search, or archival workflows. Integration points are straightforward: the stack is self-hosted, the web UI is suitable for manual review, and the output can be copied into note systems, content pipelines, or further AI analysis steps.

From an implementation perspective, the important pieces are the Docker-based runtime, the installation script provided by the project, and the storage layout used by the service containers. This skill maps well to agent workflows that need private media transcription, transcript review, or repeatable subtitle generation on controlled infrastructure.