Skill Detail

Crawlee Web Crawling and Scraping SDK

Crawlee is Apify's open source web crawling and scraping library for Node.js. It combines request queueing, browser automation, proxy support, and storage primitives so agents can build reliable Playwright, Puppeteer, Cheerio, or HTTP crawlers from one toolkit.

Research & ScrapingMulti-Framework
Research & Scraping Multi-Framework Security Reviewed
Tool match: crawlee โญ 22.8k GitHub stars โฌ‡ 80.8k/wk npm Apache-2.0 license
INSTALL WITH ANY AGENT
npx skills add agentskillexchange/skills --skill crawlee-web-crawling-and-scraping-sdk Copy
Works best when you want a reusable capability, not another fragile one-off prompt.
At a glance
Tools required
node.js, npm, bun, python, docker, java
Install & setup
npx crawlee create my-crawler
Author
apify
Publisher
Company
Last updated
Apr 8, 2026
Quick brief

Crawlee is an open source web crawling and scraping toolkit maintained by Apify for Node.js and TypeScript workloads. The project bundles the core pieces most agent builders need for production crawling: a request queue, autoscaled concurrency, retry handling, proxy support, persistent datasets, and integrations with browser and HTML parsers. From the same package, an agent can switch between PlaywrightCrawler, PuppeteerCrawler, CheerioCrawler, JSDOMCrawler, or raw HTTP crawling depending on how dynamic the target site is.

How it works

What this skill actually does

That makes Crawlee a strong fit for research, monitoring, and extraction workflows where the job to be done is not just fetching a single page, but traversing a site safely and storing structured output. The upstream documentation shows both quick-start generation through the Crawlee CLI and manual installation with Playwright for browser-backed scraping. In practice, an agent can use Crawlee to collect listings, crawl documentation portals, monitor content changes, or feed downstream RAG pipelines with normalized page data. Because the library exposes queueing, link-enqueueing, datasets, and browser hooks directly, it also works well as an integration layer between AI agents and standard automation stacks.

Upstream signals are strong: the official GitHub repository is active, the npm package exists, the docs site is live, and the project publishes releases for current versions. That makes it a solid verified-metadata intake candidate for the marketplace.