Skill Detail

PuppeteerSharp Web Scraping Pipeline

Builds headless Chrome scraping pipelines using PuppeteerSharp for .NET environments. Handles JavaScript-rendered SPAs, Cloudflare challenge bypass via stealth plugins, and exports structured data through Newtonsoft.Json serialization.

Research & ScrapingClaude Code
Research & Scraping Claude Code Security Reviewed
Tool match: puppeteer โญ 3.9k GitHub stars
INSTALL WITH ANY AGENT
npx skills add agentskillexchange/skills --skill puppeteersharp-web-scraping-pipeline Copy
Works best when you want a reusable capability, not another fragile one-off prompt.
At a glance
Tools required
.NET, Chromium
Install & setup
dotnet add package PuppeteerSharp
Author
PuppeteerSharp
Last updated
Mar 24, 2026
Quick brief

The PuppeteerSharp Web Scraping Pipeline enables robust data extraction from modern JavaScript-heavy websites within .NET ecosystems. Built on PuppeteerSharp, the official .NET port of Google Puppeteer, it launches headless Chromium instances with configurable viewport sizes, user agent rotation, and proxy chain support.

How it works

What this skill actually does

The pipeline handles complex scraping scenarios including infinite scroll pagination, shadow DOM traversal, and iframe content extraction. Anti-detection measures include puppeteer-extra-plugin-stealth patterns adapted for PuppeteerSharp, randomized timing intervals, and residential proxy integration via Bright Data or Oxylabs APIs.

Data extraction uses CSS selectors and XPath queries with automatic retry logic for stale element references. Extracted data flows through a configurable transformation pipeline using Newtonsoft.Json for serialization and AutoMapper for DTO projection. Output destinations include SQL Server via Entity Framework Core, Elasticsearch bulk indexing, or Azure Blob Storage for large datasets. The pipeline supports distributed execution across multiple Chrome instances with work stealing task scheduling.