Analyze videos with frame extraction and audio context in Claude Code
Give Claude Code a video perception layer that extracts frames, transcribes audio, and lets Claude answer questions about local videos or YouTube URLs.
npx skills add agentskillexchange/skills --skill analyze-videos-with-frame-extraction-and-audio-context-in-claude-code
Use Claude Code Video Vision when a Claude Code session needs to inspect a screen recording, demo, lecture, bug report, or YouTube video directly. The plugin extracts frames with ffmpeg, processes audio through Gemini, local Whisper, or OpenAI Whisper, and returns timestamped visual and transcript context through slash commands and MCP tools. The boundary is video perception for Claude Code, not a general media editor, video hosting platform, or standalone transcription service.
What this skill actually does
Inputs and prerequisites: Claude Code, Node.js 20+, ffmpeg, optional yt-dlp, Gemini API or Whisper/OpenAI audio backend.
Setup notes: In Claude Code, run /plugin marketplace add https://github.com/jordanrendric/claude-video-vision, then /plugin install claude-video-vision. Run /setup-video-vision to configure ffmpeg, audio backend, and optional YouTube support.
Source and verification boundary: use https://github.com/jordanrendric/claude-video-vision as the canonical reference before running the workflow; keep commands, API calls, CLI usage, and generated outputs reviewable against that upstream source.
Framework fit: publish this as a Claude Code workflow only when the operator can invoke the documented toolchain directly, rather than treating the upstream project as a generic product listing.