Skill Detail

Puppeteer Web Scraping Pipeline

Builds web scraping pipelines with Puppeteer using page.evaluate, page.waitForSelector, and browser.newPage. Handles infinite scroll, cookie consent banners, and anti-bot detection with stealth plugin.

Browser AutomationGemini

Browser Automation Gemini Security Reviewed

Tool match: puppeteer ⭐ 94.1k GitHub stars ⬇ 40.2M/wk npm Apache-2.0 license

INSTALL WITH ANY AGENT

npx skills add agentskillexchange/skills --skill puppeteer-web-scraping-pipeline Copy

Works best when you want a reusable capability, not another fragile one-off prompt.

View source Documentation

At a glance

Author

Puppeteer

Last updated

Mar 24, 2026

Quick brief

This skill creates robust web scraping pipelines using Puppeteer with the puppeteer-extra-plugin-stealth module for anti-detection. It launches headless Chrome via puppeteer.launch() with custom args including –no-sandbox and –disable-setuid-sandbox for container environments. Page interactions use page.evaluate() for DOM extraction, page.waitForSelector() with timeout configurations, and page.click() for pagination. Infinite scroll handling uses page.evaluate to detect scroll height changes and window.scrollTo() calls in a loop. Cookie consent banners are automatically dismissed through pattern matching on common consent management platforms like OneTrust and Cookiebot. The agent manages request interception via page.setRequestInterception(true) to block unnecessary resources like images and fonts for faster scraping. Data extraction uses querySelectorAll with structured mapping to JSON. Rate limiting is implemented with configurable delays between requests. The skill includes proxy rotation support via page.authenticate() and handles CAPTCHAs through 2captcha API integration when needed. Results are streamed to NDJSON files for efficient processing.

Best fit

When to reach for it

Best when the job fits Browser Automation.
Works naturally with Gemini setups.

Trust & provenance

Why this listing is credible

Built around the puppeteer toolchain.
Trust status: Security Reviewed.
94.1k GitHub stars on the linked upstream source.
40.2M/week npm downloads recorded.
License: Apache-2.0.
Last updated Mar 24, 2026.

View source ↗ Documentation ↗