Skill Detail

Browsertrix Crawler High-Fidelity Web Archiving and QA Crawler

Browsertrix Crawler is a browser-based crawling system from Webrecorder for high-fidelity site capture, QA replay analysis, and configurable crawl behavior. It runs complex crawls in a single Docker container and uses Puppeteer plus Chrome DevTools Protocol under the hood.

Research & ScrapingMulti-Framework

Research & Scraping Multi-Framework Published

Tool match: browsertrix-crawler ⭐ 1k GitHub stars AGPL-3.0 license

INSTALL WITH ANY AGENT

npx skills add agentskillexchange/skills --skill browsertrix-crawler-high-fidelity-web-archiving-qa-crawler Copy

Works best when you want a reusable capability, not another fragile one-off prompt.

View source Documentation

At a glance

Tools required

Docker

Author

Webrecorder

Publisher

Open Source Project

Last updated

Mar 31, 2026

Quick brief

Browsertrix Crawler is the open-source crawling engine behind Webrecorder’s Browsertrix platform. The upstream webrecorder/browsertrix-crawler repository describes it as a high-fidelity browser-based crawling system designed to run complex, customizable crawls in a single Docker container. Instead of only fetching raw HTML, Browsertrix Crawler uses Puppeteer to drive Brave Browser windows and captures site behavior through the Chrome DevTools Protocol, which makes it useful for modern JavaScript-heavy sites and archival-quality capture.

How it works

What this skill actually does

The official documentation highlights features that matter for agent workflows: YAML-based crawl configuration, seed lists and scope rules, blocking rules, screenshots and thumbnails, browser profile reuse, custom Puppeteer driver scripts, real-time screencasting, and quality-assurance crawling that compares replay results against captured content. This makes the tool useful not only for scraping, but also for preservation, auditing, and repeatable site QA tasks where a lightweight HTTP crawler would miss rendered behavior.

As an ASE skill, Browsertrix Crawler fits jobs such as capturing a public site into archival artifacts, validating replay completeness, running scoped multi-page crawls with browser behavior enabled, or producing screenshots and crawl reports from reproducible YAML configurations. Its outputs can include crawl packages, screenshots, QA statistics, and browser-derived capture results. Integration points include Docker-based automation, archiving pipelines, custom Puppeteer behavior scripts, and research or compliance workflows that need browser-accurate capture rather than simple HTML extraction.

The project has an active GitHub repository, official documentation, an open-source license, and recent maintenance activity. For teams that need browser-realistic crawling with reproducible configuration, it fills a distinct role inside the research and scraping category.

Best fit

When to reach for it

Best when the job fits Research & Scraping.
Works naturally with Multi-Framework setups.
Requires Docker.

Trust & provenance

Why this listing is credible

Built around the browsertrix-crawler toolchain.
Trust status: Published.
1k GitHub stars on the linked upstream source.
License: AGPL-3.0.
Last updated Mar 31, 2026.

View source ↗ Documentation ↗