Skill Detail

Puppeteer Stealth Web Scraper

Uses puppeteer-extra with stealth plugin to bypass bot detection for web scraping. Integrates with Cheerio for HTML parsing, rotating residential proxies via Bright Data API, and p-queue for concurrency control.

Research & ScrapingCursor
Research & Scraping Cursor Security Reviewed
Tool match: puppeteer โญ 94.1k GitHub stars โฌ‡ 40.2M/wk npm Apache-2.0 license
INSTALL WITH ANY AGENT
npx skills add agentskillexchange/skills --skill puppeteer-stealth-web-scraper Copy
Works best when you want a reusable capability, not another fragile one-off prompt.
At a glance
Author
Puppeteer
Last updated
Mar 24, 2026
Quick brief

The Puppeteer Stealth Web Scraper uses puppeteer-extra with the puppeteer-extra-plugin-stealth module to launch headless Chrome instances that evade common bot detection systems like Cloudflare, PerimeterX, and DataDome. It manages browser fingerprinting through realistic viewport sizes, WebGL parameters, and navigator property overrides. Proxy rotation is handled through the Bright Data API with automatic IP cycling on detection events. Extracted HTML is processed using Cheerio for efficient jQuery-style DOM traversal and data extraction without full browser rendering overhead. Concurrency is managed via p-queue to respect rate limits while maximizing throughput across multiple target domains. The agent supports structured data extraction with CSS selector templates, automatic pagination handling, and CAPTCHA solving integration via 2Captcha API. Scraped data is validated against JSON Schema definitions before being output as CSV, JSON, or directly inserted into databases via Knex.js query builder.