Skill Detail

Insanely Fast Whisper GPU-Accelerated Speech Transcription CLI

Insanely Fast Whisper is a CLI tool that transcribes audio at extreme speeds using OpenAI Whisper models with Hugging Face Transformers, Flash Attention 2, and batched inference. It can transcribe 150 minutes of audio in under 98 seconds on a GPU.

Media & TranscriptionMulti-Framework

Media & Transcription Multi-Framework Security Reviewed

Tool match: insanely-fast-whisper ⭐ 12.2k GitHub stars Apache-2.0 license

INSTALL WITH ANY AGENT

npx skills add agentskillexchange/skills --skill insanely-fast-whisper-gpu-transcription Copy

Works best when you want a reusable capability, not another fragile one-off prompt.

View source

At a glance

Last updated

Mar 29, 2026

Quick brief

Insanely Fast Whisper is an opinionated command-line interface for ultra-fast on-device audio transcription powered by OpenAI’s Whisper models. Built on top of Hugging Face Transformers, Optimum, and Flash Attention, it achieves transcription speeds that are dramatically faster than standard Whisper implementations.

How it works

What this skill actually does

How It Works

The tool leverages several optimization techniques to maximize transcription throughput: batched inference processes multiple audio chunks simultaneously, BetterTransformer and Flash Attention 2 accelerate the model’s attention computation, and fp16 precision reduces memory usage while maintaining accuracy. On an NVIDIA A100 GPU, it can transcribe 150 minutes of audio in under 98 seconds with the large-v3 model.

Key Capabilities

Multiple model support: Works with all Whisper model sizes (tiny through large-v3) and distil-whisper variants for even faster processing.
Flash Attention 2: Enable with --flash True for maximum GPU throughput on supported hardware.
Configurable batching: Adjust batch size with --batch-size to optimize for available GPU memory.
Multiple output formats: Generates transcripts with timestamps suitable for subtitles, captions, and text processing.
URL input support: Accepts both local files and URLs as input, downloading audio automatically.
Mac MPS support: Works on Apple Silicon Macs via the --device-id mps flag.
Distil-Whisper integration: Use distilled models for even faster transcription with minimal quality loss.

Integration Points

Install via pipx install insanely-fast-whisper or run without installing using pipx run insanely-fast-whisper. The tool fits into audio processing pipelines, podcast transcription workflows, subtitle generation systems, and any scenario requiring fast batch transcription. It supports NVIDIA GPUs via CUDA and Apple Silicon via MPS. Combine with downstream tools for speaker diarization, translation, or text analysis.

Example Usage

insanely-fast-whisper --file-name recording.mp3 --flash True --batch-size 24

This transcribes the audio file using Flash Attention 2 with a batch size of 24 for maximum throughput.

Best fit

When to reach for it

Best when the job fits Media & Transcription.
Works naturally with Multi-Framework setups.

Trust & provenance

Why this listing is credible

Built around the insanely-fast-whisper toolchain.
Trust status: Security Reviewed.
12.2k GitHub stars on the linked upstream source.
License: Apache-2.0.
Last updated Mar 29, 2026.

View source ↗