Skill Detail

Apache Tika Document Extractor

Wraps Apache Tika Server REST API for extracting structured text from PDFs, DOCX, PPTX, and 1,200+ file formats. Outputs clean markdown with metadata preservation using Tika /rmeta/text endpoint and recursive parsing mode.

Data Extraction & TransformationCodex
Data Extraction & Transformation Codex Security Reviewed
Tool match: tika โญ 3.7k GitHub stars Apache-2.0 license
INSTALL WITH ANY AGENT
npx skills add agentskillexchange/skills --skill apache-tika-document-extractor Copy
Works best when you want a reusable capability, not another fragile one-off prompt.
At a glance
Last updated
Mar 24, 2026
Quick brief

Wraps Apache Tika Server REST API for extracting structured text from PDFs, DOCX, PPTX, and 1,200+ file formats. Outputs clean markdown with metadata preservation using Tika /rmeta/text endpoint and recursive parsing mode.

How it works

What this skill actually does

Overview

This skill provides automated integration capabilities designed for production agent workflows. It handles authentication, rate limiting, and error recovery out of the box, allowing agents to focus on high-level task orchestration rather than low-level API management.

Key Features

  • Automatic retry logic with exponential backoff for API rate limits
  • Structured output formatting compatible with downstream agent pipelines
  • Comprehensive error handling with actionable diagnostic messages
  • Configurable caching layer to reduce redundant API calls

Usage

Install via the Agent Skill Exchange registry and configure with your API credentials. The skill exposes a standardized interface that works across supported agent frameworks, with framework-specific optimizations applied automatically during initialization.