Skill Detail

Extract invoice fields from vendor PDFs into structured records

Uses invoice2data to turn invoice PDFs into structured JSON, CSV, or XML using supplier-specific templates. This is for repeatable invoice field extraction and renaming workflows, not for full accounting system automation or generic OCR catalog listings.

Data Extraction & TransformationMulti-Framework

Data Extraction & Transformation Multi-Framework Security Reviewed

⭐ 2.1k GitHub stars

INSTALL WITH ANY AGENT

npx skills add agentskillexchange/skills --skill extract-invoice-fields-from-vendor-pdfs-into-structured-records Copy

Works best when you want a reusable capability, not another fragile one-off prompt.

View source Documentation

At a glance

Tools required

pdftotext or pdfminer or pdfplumber or OCRmyPDF or Tesseract or Google Cloud Vision

Install & setup

pip install invoice2data plus one supported input reader such as pdftotext, pdfminer, pdfplumber, OCRmyPDF, Tesseract, or Google Cloud Vision as documented upstream.

Author

invoice-x

Publisher

Open Source Project

Last updated

Apr 11, 2026

Quick brief

Tool used: invoice2data.

How it works

What this skill actually does

This skill is for agents that need to pull real accounting fields out of vendor invoices and hand back structured records that another system can trust. An agent can run invoice2data over one file or a folder, choose the right text extraction backend, apply built-in or custom templates, and emit normalized JSON, CSV, or XML containing fields such as invoice number, date, amount, currency, partner name, and line items when templates support them.

Invoke this instead of using the product manually when the user has recurring invoices from known suppliers and wants the extraction step automated. The agent can select or maintain supplier templates, run debug mode when a new layout appears, and slot the extracted data into downstream bookkeeping, approval, or reconciliation flows. That is much more useful than simply handing someone a PDF parser and asking them to do the rest.

The scope boundary keeps this skill-shaped. It is not an ERP, not an accounts-payable platform, and not a generic OCR or document-management listing. The job is specifically invoice field extraction from supplier documents into structured machine-readable records, using invoice2data’s template system and extraction backends.

Integration points are clear in upstream docs: OCRmyPDF, Tesseract, pdftotext, pdfminer, pdfplumber, and Google Cloud Vision can all serve as input readers; outputs can feed spreadsheets, finance pipelines, archive renaming jobs, or custom Python automations.

Best fit

When to reach for it

Best when the job fits Data Extraction & Transformation.
Works naturally with Multi-Framework setups.
Requires pdftotext or pdfminer or pdfplumber or OCRmyPDF or Tesseract or….
Installation is straightforward: pip install invoice2data plus one supported input reader such as pdftotext, pdfminer, pdfplumber, OCRmyPDF, Tesseract, or…

Trust & provenance

Why this listing is credible

Trust status: Security Reviewed.
2.1k GitHub stars on the linked upstream source.
Last updated Apr 11, 2026.

View source ↗ Documentation ↗