Skill Detail

Crawlee Web Crawling and Browser Automation Library

Builds scalable web collection pipelines with Crawlee, Apify’s open-source crawling and browser automation library. Useful for request queue management, Playwright or Puppeteer crawling, structured dataset export, and resilient scraping across large sets of pages.

Research & ScrapingMulti-Framework

Research & Scraping Multi-Framework Security Reviewed

Tool match: crawlee ⭐ 22.6k GitHub stars ⬇ 72.5k/wk npm Apache-2.0 license

INSTALL WITH ANY AGENT

npx skills add agentskillexchange/skills --skill crawlee-web-crawling-browser-automation-library Copy

Works best when you want a reusable capability, not another fragile one-off prompt.

View source

At a glance

Last updated

Mar 27, 2026

Quick brief

Crawlee Web Crawling and Browser Automation Library is designed for large-scale collection jobs where a single-page scraper is not enough. It is grounded in the real Crawlee project from Apify and its crawler stack, including CheerioCrawler, PlaywrightCrawler, PuppeteerCrawler, request queues, autoscaled concurrency, session pools, router handlers, proxy configuration, and dataset exports. That gives agents a real toolkit for moving from “grab this page” to “crawl this site reliably and keep the data organized.”

How it works

What this skill actually does

The skill helps choose the right crawler type for the target site, define routing rules for list pages versus detail pages, manage retries, and avoid wasting headless-browser resources on pages that only need HTTP parsing. It also helps structure output so downstream systems receive normalized records instead of a pile of ad hoc HTML fragments. For research and scraping workloads, that distinction matters because the expensive part is rarely the first request; it is maintaining quality and throughput across thousands of URLs.

Outputs can include JSON datasets, cleaned page records, crawl logs, request queue state, extracted entities, and evidence about failures such as blocked requests or selector drift. Integration points include Node.js data pipelines, Apify Actors, Playwright-based extraction steps, proxy layers, and downstream analytics or retrieval systems that ingest crawl results. Use this skill when the task requires coordinated crawling, browser-aware extraction, and repeatable collection architecture rather than a one-off scrape script.

Best fit

When to reach for it

Best when the job fits Research & Scraping.
Works naturally with Multi-Framework setups.

Trust & provenance

Why this listing is credible

Built around the crawlee toolchain.
Trust status: Security Reviewed.
22.6k GitHub stars on the linked upstream source.
72.5k/week npm downloads recorded.
License: Apache-2.0.
Last updated Mar 27, 2026.

View source ↗