Skill Detail

Convert dense PDFs into LLM-ready text and page-aligned markdown with olmOCR

Use olmOCR when an agent needs to turn scanned or layout-heavy documents into clean markdown or text before chunking, search, extraction, or citation workflows.

Data Extraction & TransformationMulti-Framework

Data Extraction & Transformation Multi-Framework Security Reviewed

⭐ 17.1k GitHub stars

INSTALL WITH ANY AGENT

npx skills add agentskillexchange/skills --skill convert-dense-pdfs-into-llm-ready-text-and-page-aligned-markdown-with-olmocr

Copy

Works best when you want a reusable capability, not another fragile one-off prompt.

View source

At a glance

Tools required

Python 3.11, pip or conda, poppler-utils, optional NVIDIA GPU for local inference

Install & setup

Create a clean Python environment, install required PDF rendering dependencies, then install the package with pip install olmocr for remote inference or pip install olmocr[gpu] for local GPU inference.

Author

Allen Institute for AI

Publisher

Organization

Last updated

Apr 16, 2026

Quick brief

Use olmOCR when the job is to convert PDFs or image-based documents into readable markdown or plain text with natural reading order, table handling, and header or footer cleanup. Invoke it instead of treating the original PDF as the working surface when downstream agent steps depend on clean text for retrieval, extraction, QA, or citation. The scope boundary is specific and skill-shaped: this is a document linearization and OCR preprocessing workflow, not a general document platform listing and not just a raw OCR model card.

Best fit

When to reach for it

Best when the job fits Data Extraction & Transformation.
Works naturally with Multi-Framework setups.
Requires Python 3.11, pip or conda, poppler-utils, optional NVIDIA GPU for….
Installation is straightforward: Create a clean Python environment, install required PDF rendering dependencies, then install the package with pip…

Trust & provenance

Why this listing is credible

Trust status: Security Reviewed.
17.1k GitHub stars on the linked upstream source.
Last updated Apr 16, 2026.

View source ↗