Skill Detail

Turn messy document collections into structured rows with DocETL

Define repeatable extraction pipelines that pull fields from large document collections, normalize outputs, and audit failures across the corpus.

Data Extraction & TransformationMulti-Framework

Data Extraction & Transformation Multi-Framework Security Reviewed

⭐ 3.7k GitHub stars

INSTALL WITH ANY AGENT

npx skills add agentskillexchange/skills --skill turn-messy-document-collections-into-structured-rows-with-docetl Copy

Works best when you want a reusable capability, not another fragile one-off prompt.

View source Documentation

At a glance

Tools required

Python 3.10+, DocETL, document corpus, extraction configuration

Install & setup

Install DocETL from the project instructions, configure the extraction pipeline for your document set, then run the pipeline to emit normalized structured outputs and review failures.

Author

UCB EPIC

Publisher

Organization

Last updated

Apr 15, 2026

Quick brief

Use DocETL when an agent needs to convert a pile of semi-structured documents into rows that downstream systems can trust. The agent can define extraction steps, normalize fields, track failures, and iterate on a repeatable document-to-structured-data pipeline instead of doing one-off parsing. The boundary is tightly around document extraction and auditability, not a generic document platform or LLM framework listing.

Best fit

When to reach for it

Best when the job fits Data Extraction & Transformation.
Works naturally with Multi-Framework setups.
Requires Python 3.10+, DocETL, document corpus, extraction configuration.
Installation is straightforward: Install DocETL from the project instructions, configure the extraction pipeline for your document set, then run…

Trust & provenance

Why this listing is credible

Trust status: Security Reviewed.
3.7k GitHub stars on the linked upstream source.
Last updated Apr 15, 2026.

View source ↗ Documentation ↗