Piper Local Neural Text-to-Speech Engine
Run fast, high-quality neural text-to-speech locally with Piper. Supports 20+ languages with compact ONNX voice models, no cloud API required, and produces natural-sounding speech on CPUs including Raspberry Pi.
npx skills add agentskillexchange/skills --skill piper-local-neural-tts-engine
Piper is a fast, local neural text-to-speech (TTS) system developed by the Rhasspy project and now maintained by the Open Home Foundation. It converts text to natural-sounding speech using lightweight ONNX neural network models that run entirely on-device โ no cloud API calls, no internet connection required. Piper is designed for voice assistants, accessibility tools, and any application that needs offline speech synthesis.
What this skill actually does
What is Piper?
How This Skill Works
This skill enables agents to synthesize speech from text using the Piper TTS engine. Agents can select from a library of pre-trained voice models covering 20+ languages including English, German, French, Spanish, Chinese, Russian, and many more. The engine processes text through an espeak-ng phonemizer, then feeds phoneme sequences through a VITS-based neural network to produce raw audio output. The entire pipeline runs locally on CPU with sub-real-time latency on modern hardware.
Key Capabilities
- Offline operation: All processing happens locally. Models are compact ONNX files (15-100 MB) that load quickly and run without GPU acceleration.
- Multi-language support: Pre-trained voices available for English (US/UK), German, French, Spanish, Portuguese, Italian, Dutch, Russian, Chinese, Arabic, Hindi, Japanese, and 10+ more languages.
- Voice quality tiers: Choose between low, medium, and high quality models trading off file size and CPU usage for naturalness. High-quality models use multi-speaker VITS architectures.
- Streaming output: Piper supports streaming audio generation, outputting PCM audio chunks as they are synthesized for low-latency real-time playback.
- CLI and library usage: Use the
pipercommand-line tool to pipe text in and get WAV audio out, or link against the C++ library for embedding in applications. Python bindings available viapiper-ttson PyPI.
Integration Points
Piper integrates with Home Assistant for voice assistant pipelines, with Wyoming protocol for modular voice satellite setups, and with any system that can consume WAV or raw PCM audio. It outputs 16-bit mono audio at configurable sample rates (16kHz or 22.05kHz depending on the model). The CLI accepts text via stdin and writes audio to stdout, making it composable with ffmpeg, sox, aplay, and other audio tools via shell pipes. Docker images are available for containerized deployments.
Source
GitHub: rhasspy/piper (10.7K+ stars, MIT license) โ New development: OHF-Voice/piper1-gpl