Skill Detail

Crawlee Web Crawling and Scraping SDK

Crawlee is Apify's open source web crawling and scraping library for Node.js. It combines request queueing, browser automation, proxy support, and storage primitives so agents can build reliable Playwright, Puppeteer, Cheerio, or HTTP crawlers from one toolkit.

Research & ScrapingMulti-Framework

Research & Scraping Multi-Framework Security Reviewed

Tool match: crawlee ⭐ 22.8k GitHub stars ⬇ 80.8k/wk npm Apache-2.0 license

INSTALL WITH ANY AGENT

npx skills add agentskillexchange/skills --skill crawlee-web-crawling-and-scraping-sdk Copy

Works best when you want a reusable capability, not another fragile one-off prompt.

View source Documentation

At a glance

Tools required

node.js, npm, bun, python, docker, java

Install & setup

npx crawlee create my-crawler

Author

apify

Publisher

Company

Last updated

Apr 8, 2026

Quick brief

Crawlee is an open source web crawling and scraping toolkit maintained by Apify for Node.js and TypeScript workloads. The project bundles the core pieces most agent builders need for production crawling: a request queue, autoscaled concurrency, retry handling, proxy support, persistent datasets, and integrations with browser and HTML parsers. From the same package, an agent can switch between PlaywrightCrawler, PuppeteerCrawler, CheerioCrawler, JSDOMCrawler, or raw HTTP crawling depending on how dynamic the target site is.

How it works

What this skill actually does

That makes Crawlee a strong fit for research, monitoring, and extraction workflows where the job to be done is not just fetching a single page, but traversing a site safely and storing structured output. The upstream documentation shows both quick-start generation through the Crawlee CLI and manual installation with Playwright for browser-backed scraping. In practice, an agent can use Crawlee to collect listings, crawl documentation portals, monitor content changes, or feed downstream RAG pipelines with normalized page data. Because the library exposes queueing, link-enqueueing, datasets, and browser hooks directly, it also works well as an integration layer between AI agents and standard automation stacks.

Upstream signals are strong: the official GitHub repository is active, the npm package exists, the docs site is live, and the project publishes releases for current versions. That makes it a solid verified-metadata intake candidate for the marketplace.

Best fit

When to reach for it

Best when the job fits Research & Scraping.
Works naturally with Multi-Framework setups.
Requires node.js, npm, bun, python, docker, java.
Installation is straightforward: npx crawlee create my-crawler

Trust & provenance

Why this listing is credible

Built around the crawlee toolchain.
Trust status: Security Reviewed.
22.8k GitHub stars on the linked upstream source.
80.8k/week npm downloads recorded.
License: Apache-2.0.
Last updated Apr 8, 2026.

View source ↗ Documentation ↗