Skill Detail

PDF Table Extraction with Camelot

Extracts structured tables from PDF documents using Camelot's lattice and stream parsing algorithms. Handles merged cells, multi-page tables, and outputs to pandas DataFrames, CSV, and JSON with column type inference.

Data Extraction & TransformationClaude Code

Data Extraction & Transformation Claude Code Published

Tool match: pandas ⭐ 3.7k GitHub stars

INSTALL WITH ANY AGENT

npx skills add agentskillexchange/skills --skill pdf-table-extraction-camelot Copy

Works best when you want a reusable capability, not another fragile one-off prompt.

View source

At a glance

Author

Camelot

Last updated

Mar 24, 2026

Quick brief

The PDF Table Extraction skill leverages the Camelot library to accurately extract tabular data from PDF documents, handling complex layouts that defeat simpler regex-based approaches. It supports both lattice mode for tables with visible cell borders and stream mode for tables defined by whitespace alignment, automatically selecting the optimal strategy based on page analysis.

How it works

What this skill actually does

The skill handles common PDF table challenges including merged cells spanning multiple rows or columns, tables that flow across page boundaries, rotated text in header rows, and nested sub-tables. Extracted data is output as pandas DataFrames with automatic column type inference (numeric, date, currency, categorical), and can be serialized to CSV, JSON, Excel, or Parquet formats.

Advanced features include accuracy scoring per extracted table to flag low-confidence results for manual review, visual debugging overlays that show detected cell boundaries on the original PDF, batch processing of multi-page documents with table continuation detection, and OCR fallback via Tesseract for scanned PDF pages where text extraction returns empty results. The skill supports coordinate-based table region specification for precise extraction from known page layouts.

Best fit

When to reach for it

Best when the job fits Data Extraction & Transformation.
Works naturally with Claude Code setups.

Trust & provenance

Why this listing is credible

Built around the pandas toolchain.
Trust status: Published.
3.7k GitHub stars on the linked upstream source.
Last updated Mar 24, 2026.

View source ↗