Skill Detail

Cheerio DOM Extraction Pipeline

Builds configurable data extraction pipelines using Cheerio for server-side DOM parsing with CSS selector chains. Supports JSONPath output mapping, pagination following, and schema validation via Ajv.

Data Extraction & TransformationCodex

Data Extraction & Transformation Codex Published

Tool match: cheerio ⭐ 30.3k GitHub stars ⬇ 19.6M/wk npm MIT license

INSTALL WITH ANY AGENT

npx skills add agentskillexchange/skills --skill cheerio-dom-extraction-pipeline Copy

Works best when you want a reusable capability, not another fragile one-off prompt.

View source

At a glance

Last updated

Mar 24, 2026

Quick brief

The Cheerio DOM Extraction Pipeline enables high-performance structured data extraction from HTML without browser overhead. Using Cheerio’s jQuery-like API for server-side DOM manipulation, it processes HTML documents at thousands of pages per second with minimal memory footprint.

How it works

What this skill actually does

Extraction rules are defined as CSS selector chains with optional attribute extraction, text normalization, and regex post-processing. The pipeline supports nested data structures through recursive selector evaluation, handling complex layouts like product listings with variant tables, review threads, and paginated comment sections.

Output mapping uses JSONPath expressions to transform extracted arrays into structured JSON objects matching target schemas. Schema validation is performed using Ajv (Another JSON Schema Validator) with custom format validators for common data types like prices, dates, URLs, and phone numbers.

Pagination handling supports multiple strategies: next-page link following, URL pattern increment, infinite scroll simulation via API endpoint detection, and cursor-based pagination. Rate limiting is built in with configurable delays and concurrent request limits using p-queue.

Best fit

When to reach for it

Best when the job fits Data Extraction & Transformation.
Works naturally with Codex setups.

Trust & provenance

Why this listing is credible

Built around the cheerio toolchain.
Trust status: Published.
30.3k GitHub stars on the linked upstream source.
19.6M/week npm downloads recorded.
License: MIT.
Last updated Mar 24, 2026.

View source ↗