Zerox vs LlamaCloud vs MarkItDown: Which Document Parsing Skill Fits Your Workflow?
Document parsing is where many agent workflows either become useful or become messy. The hard part is not “turn this PDF into text.” The hard part is choosing the parsing path that preserves enough structure for review, keeps sensitive material under the right boundary, and gives the agent an output it can actually use.
On Agent Skill Exchange, three useful choices cover different parts of that job: Zerox, LlamaCloud Services, and MarkItDown. They overlap, but they are not interchangeable. Pick based on your source format, tolerance for managed infrastructure, and how much review evidence the final workflow needs.
| Input problem | Best starting skill | Why it fits | Review checkpoint |
|---|---|---|---|
| Scanned PDFs or image-heavy documents | Zerox | OCR-oriented conversion into Markdown that agents can inspect page by page | Compare extracted text against original pages before acting |
| Repeated document pipelines with indexing needs | LlamaCloud Services | Managed parsing and document knowledge workflows for retrieval-heavy systems | Validate parser settings, source access, and retrieval citations |
| Office files, web files, and simple PDFs needing quick Markdown | MarkItDown | Lightweight local conversion into LLM-friendly Markdown across many common file types | Spot-check tables, links, embedded media, and omitted context |
In Short
- Use Zerox when the main problem is OCR-ready extraction from scanned, image-heavy, or visually complex PDFs.
- Use LlamaCloud Services when the workflow needs managed parsing, indexing, and retrieval around recurring document collections.
- Use MarkItDown when you need a pragmatic document-to-Markdown converter for common files before summarization, review, or lightweight analysis.
- For legal, healthcare, finance, real estate, or compliance documents, treat parsing as intake only. Keep human review before decisions, filings, approvals, or customer-facing outputs.
Who This Is For
This guide is for operators building agent workflows around documents: finance teams collecting invoices, legal teams preparing review packets, healthcare administrators organizing intake files, DevRel teams converting specs, and support teams turning knowledge files into answerable sources. If your agent needs to read documents before it can summarize, extract, reconcile, or route work, the parser choice matters.
ASE already has broader document workflow guidance in OCR vs Structured Extraction vs Archive Search and From PDF to Decision Packet. This article narrows the question to three concrete parsing skills that appear early in many practical stacks.
Decision path
Start with the document, not the tool. A digitally generated PDF, a scanned contract, a messy spreadsheet, and a folder of recurring vendor files need different handling. The fastest wrong move is to send all of them through one converter and assume “Markdown” means “usable.”
If the source is scanned, image-heavy, or visually encoded, begin with Zerox. It is a better fit when the agent needs page-aware text extraction and the original document is not already clean text. This is common in signed PDFs, old forms, receipts, statements, and mixed scan bundles. The output still needs review, but the workflow starts from the right problem: recovering readable structure from visual documents.
If the source is part of a repeated knowledge pipeline, consider LlamaCloud Services. The value here is not only conversion. It is managed document parsing around an ecosystem designed for indexing and retrieval. That fits teams that expect recurring uploads, document collections, query workflows, and retrieval-grounded answers. It may be more infrastructure than a one-off document needs, but it is a sensible route when parsing is part of a larger document system.
If the source is a common file that mostly needs clean Markdown, start with MarkItDown. Microsoft’s project is especially useful when the workflow spans PDFs, Office files, HTML, audio metadata, and other everyday formats. It is a good first pass for summaries, internal notes, and LLM-ready text where lightweight conversion beats a heavier pipeline.
Recommended ASE Skills
| Skill | Use it when | Avoid using it as |
|---|---|---|
| Extract OCR-ready Markdown from documents with Zerox | The input is scanned, image-heavy, or needs OCR before an agent can inspect it. | A final authority for legal, medical, financial, or compliance facts without source review. |
| Build managed document parsing pipelines with LlamaCloud Services | You need recurring parsing, indexing, and retrieval workflows around document collections. | A quick local converter for one-off files where managed setup adds unnecessary complexity. |
| MarkItDown Document-to-Markdown Converter by Microsoft | You need a local, lightweight path from common documents into agent-readable Markdown. | A layout-perfect extraction system for documents where tables, signatures, or visual evidence are decisive. |
| Evaluate document parsers for agent ingestion with ParseBench | You are comparing parser output quality before standardizing a workflow. | A replacement for domain review or acceptance testing on your own documents. |
| Turn messy document collections into structured rows with DocETL | You need parser output to become repeatable rows for downstream review, reconciliation, or routing. | A shortcut around source checks, parser evaluation, or human approval for sensitive records. |
| Data Extraction & Transformation category | You need alternatives such as Docling, Unstructured, Tesseract, OCRmyPDF, or DocETL. | A shortcut around checking source, permissions, and output evidence. |
How to Choose by Workflow
For finance operations, the main question is usually traceability. Vendor invoices, billing records, and statements need extracted fields, but the operator also needs to know where those fields came from. A lightweight converter may be enough for a readable invoice summary. OCR is better for scanned attachments. A managed pipeline makes sense when invoices arrive repeatedly and feed a review process like the Xero MCP, Google Sheets, and invoice intake stack.
For legal and compliance work, source preservation matters more than speed. A parser can prepare a review packet, but it should not become the decision-maker. Use the parsing output to route clauses, dates, exhibits, and obligations into a review queue. Keep the original file attached, preserve page references where possible, and connect the workflow to the boundaries described in Legal Ops & Compliance Skills.
For healthcare documentation, avoid clinical claims. Parsing can organize intake files, insurance forms, referral packets, and administrative documents, but the agent should not infer medical advice from partial extraction. If the documents are scans, OCR quality checks are mandatory. If they are recurring records in a controlled process, managed parsing may be worth the setup. The right frame is administrative assistance, as covered in Healthcare Documentation & Intake Skills.
For research, support, and publishing workflows, MarkItDown often earns the first try. It is simple, local, and direct. A team converting PDFs, HTML pages, Word docs, or slides into notes may not need a full parsing pipeline. The review risk is still real: tables can flatten, images can lose meaning, and source context can disappear. Use it for draftable text, not invisible automation.
What to Watch
- Tables and forms: Parsers often make confident-looking Markdown from rows that were visually ambiguous. Review totals, dates, and row alignment against the original.
- Missing context: Headers, footnotes, stamps, signatures, and page order can carry meaning. Do not treat plain text as the whole document.
- Managed-service boundaries: Before using a cloud pipeline, check data policy, retention, authentication, and whether the documents are allowed to leave your environment.
- Retrieval overconfidence: Good parsing does not guarantee good answers. Retrieval workflows still need citations, source snippets, and refusal behavior when evidence is thin.
- Regulated data: For finance, legal, healthcare, and compliance documents, the parser should create evidence for review, not approvals or advice.
External References
The upstream projects are worth reading before you standardize the workflow: the Zerox GitHub repository, the LlamaCloud Services GitHub repository, the LlamaCloud documentation, and Microsoft’s MarkItDown repository. For sensitive workflows, pair those docs with your own data-handling requirements rather than assuming a parser choice is only an engineering decision.
FAQ
Is Zerox better than MarkItDown?
Not universally. Zerox is the better starting point when OCR and visual document recovery are the hard part. MarkItDown is usually simpler when the document already has recoverable structure and you mainly need Markdown for an agent.
When is LlamaCloud Services worth it?
Use it when parsing is part of a recurring document system: ingestion, indexing, retrieval, and agent answers over a collection. For a one-off file, a lighter local converter may be enough.
Can an agent act automatically on parsed document data?
For low-risk internal drafts, maybe. For money movement, legal obligations, healthcare context, compliance filings, or customer-facing decisions, no. Parsed output should feed a review packet with source evidence and a human checkpoint.
What should I test before choosing a parser?
Test your hardest real documents: scanned pages, tables, signatures, handwritten notes, rotated pages, mixed file types, and long attachments. Compare outputs with ParseBench or a small internal review rubric before standardizing.
