Skill Detail

Tabula PDF Table Extractor

Extracts structured tables from PDF documents using Tabula-java with lattice and stream detection modes. Outputs to CSV, JSON, or pandas DataFrames with automatic column type inference via python-tabula.

Data Extraction & TransformationMCP

Data Extraction & Transformation MCP Security Reviewed

Tool match: pandas ⭐ 2k GitHub stars

INSTALL WITH ANY AGENT

npx skills add agentskillexchange/skills --skill tabula-pdf-table-extractor Copy

Works best when you want a reusable capability, not another fragile one-off prompt.

View source

At a glance

Author

Tabula

Last updated

Mar 24, 2026

Quick brief

The Tabula PDF Table Extractor specializes in high-accuracy table extraction from PDF documents using the tabula-java library accessed through the python-tabula (tabula-py) wrapper. It handles both lattice-based tables (with visible gridlines) and stream-based tables (whitespace-aligned columns).

How it works

What this skill actually does

The skill automatically analyzes each PDF page to determine the optimal extraction method — lattice mode for ruled tables and stream mode for borderless tables. Custom extraction areas can be specified using coordinate-based regions when automatic detection fails, with a visual preview mode that highlights detected table boundaries.

Post-extraction processing includes automatic column type inference (numeric, date, currency, percentage, text), header row detection and normalization, merged cell handling, and multi-page table concatenation. Output formats include CSV with proper escaping, JSON with nested structures, and pandas DataFrame objects for direct analytical use.

Batch processing supports directory scanning with file filtering, parallel extraction using concurrent.futures, and progress tracking. Integration with OpenPyXL enables direct Excel output with formatted headers, auto-sized columns, and conditional formatting for numeric data.

Best fit

When to reach for it

Best when the job fits Data Extraction & Transformation.
Works naturally with MCP setups.

Trust & provenance

Why this listing is credible

Built around the pandas toolchain.
Trust status: Security Reviewed.
2k GitHub stars on the linked upstream source.
Last updated Mar 24, 2026.

View source ↗