Build voice and multimodal agents with Pipecat
Use Pipecat to define realtime voice and multimodal agent pipelines with transports, model providers, tools, and turn-taking tests.
npx skills add agentskillexchange/skills --skill build-voice-and-multimodal-agents-with-pipecat
Pipecat is an open-source framework for realtime voice and multimodal conversational AI. This skill is for developers who need to define a conversation pipeline, connect audio or video transport, add model providers and tools, and test latency, interruption, and turn-taking behavior. Invoke it when the target workflow is a realtime voice or multimodal agent rather than a text-only assistant. The boundary is the conversational pipeline and runtime behavior, not a generic media SDK listing.
What this skill actually does
Inputs and prerequisites: Pipecat, audio or video transport, model provider credentials.
Setup notes: Follow the Pipecat repository setup instructions, configure the selected transport and model providers, define the conversational pipeline and tools, then test realtime latency and turn-taking.
Source and verification boundary: use https://github.com/pipecat-ai/pipecat as the canonical reference before running the workflow; keep commands, API calls, CLI usage, and generated outputs reviewable against that upstream source.
Framework fit: publish this as a Custom Agents workflow only when the operator can invoke the documented toolchain directly, rather than treating the upstream project as a generic product listing.