Skill Detail

Apache Tika Document Extractor

Wraps Apache Tika Server REST API for extracting structured text from PDFs, DOCX, PPTX, and 1,200+ file formats. Outputs clean markdown with metadata preservation using Tika /rmeta/text endpoint and recursive parsing mode.

Data Extraction & TransformationCodex

Data Extraction & Transformation Codex Security Reviewed

Tool match: tika ⭐ 3.7k GitHub stars Apache-2.0 license

INSTALL WITH ANY AGENT

npx skills add agentskillexchange/skills --skill apache-tika-document-extractor Copy

Works best when you want a reusable capability, not another fragile one-off prompt.

View source

At a glance

Last updated

Mar 24, 2026

Quick brief

How it works

What this skill actually does

Overview

This skill provides automated integration capabilities designed for production agent workflows. It handles authentication, rate limiting, and error recovery out of the box, allowing agents to focus on high-level task orchestration rather than low-level API management.

Key Features

Automatic retry logic with exponential backoff for API rate limits
Structured output formatting compatible with downstream agent pipelines
Comprehensive error handling with actionable diagnostic messages
Configurable caching layer to reduce redundant API calls

Usage

Install via the Agent Skill Exchange registry and configure with your API credentials. The skill exposes a standardized interface that works across supported agent frameworks, with framework-specific optimizations applied automatically during initialization.

Best fit

When to reach for it

Best when the job fits Data Extraction & Transformation.
Works naturally with Codex setups.

Trust & provenance

Why this listing is credible

Built around the tika toolchain.
Trust status: Security Reviewed.
3.7k GitHub stars on the linked upstream source.
License: Apache-2.0.
Last updated Mar 24, 2026.

View source ↗