Skill Detail

Puppeteer Web Scraping Pipeline

Builds web scraping pipelines with Puppeteer using page.evaluate, page.waitForSelector, and browser.newPage. Handles infinite scroll, cookie consent banners, and anti-bot detection with stealth plugin.

Browser AutomationGemini
Browser Automation Gemini Security Reviewed
Tool match: puppeteer โญ 94.1k GitHub stars โฌ‡ 40.2M/wk npm Apache-2.0 license
INSTALL WITH ANY AGENT
npx skills add agentskillexchange/skills --skill puppeteer-web-scraping-pipeline Copy
Works best when you want a reusable capability, not another fragile one-off prompt.
At a glance
Author
Puppeteer
Last updated
Mar 24, 2026
Quick brief

This skill creates robust web scraping pipelines using Puppeteer with the puppeteer-extra-plugin-stealth module for anti-detection. It launches headless Chrome via puppeteer.launch() with custom args including –no-sandbox and –disable-setuid-sandbox for container environments. Page interactions use page.evaluate() for DOM extraction, page.waitForSelector() with timeout configurations, and page.click() for pagination. Infinite scroll handling uses page.evaluate to detect scroll height changes and window.scrollTo() calls in a loop. Cookie consent banners are automatically dismissed through pattern matching on common consent management platforms like OneTrust and Cookiebot. The agent manages request interception via page.setRequestInterception(true) to block unnecessary resources like images and fonts for faster scraping. Data extraction uses querySelectorAll with structured mapping to JSON. Rate limiting is implemented with configurable delays between requests. The skill includes proxy rotation support via page.authenticate() and handles CAPTCHAs through 2captcha API integration when needed. Results are streamed to NDJSON files for efficient processing.