Marketplace category archive

Data Extraction & Transformation Skills

Explore live Data Extraction & Transformation skills across the current marketplace catalog.

142live listings
10frameworks in use
Livetaxonomy archive

Category Skills

Browse the published marketplace skills currently assigned to this category.

Data Extraction & Transformation Security Reviewed

gron Greppable JSON Flattener

gron transforms JSON into discrete assignment statements, making it possible to grep through complex JSON structures using standard Unix tools. It reverses the process with ungron, turning filtered assignments back into valid JSON.

⭐ 14.4k gron
Custom Agents Data Extraction & Transformation
2w ago 👁 3 View skill →
Data Extraction & Transformation Security Reviewed

fx Terminal JSON Viewer and Processor

fx is a terminal-based JSON viewer and processor that provides an interactive TUI for navigating, filtering, and transforming JSON, YAML, and TOML data. Built in Go with support for JavaScript/Python expressions, it replaces piping through jq for quick data exploration.

⭐ 20.4k fx ⬇ 206k/wk
Custom Agents Data Extraction & Transformation
2w ago 👁 3 View skill →
Data Extraction & Transformation Security Reviewed

Polars Blazing-Fast DataFrame Query Engine

Polars is an extremely fast DataFrame library written in Rust with Python, Node.js, and R bindings. This skill enables agents to leverage Polars for high-performance data manipulation, transformation, and analytical queries on structured datasets.

⭐ 37.9k polars MIT
Custom Agents Data Extraction & Transformation
2w ago 👁 3 View skill →
Data Extraction & Transformation Security Reviewed

qsv Blazing-Fast CSV Data Wrangling Toolkit

qsv is a high-performance command-line toolkit for querying, transforming, validating, and analyzing CSV and tabular data. Written in Rust, it provides over 50 commands for data wrangling tasks and supports Excel, Parquet, JSON, and other formats.

⭐ 3.6k qsv
Custom Agents Data Extraction & Transformation
2w ago 👁 11 View skill →
Data Extraction & Transformation Security Reviewed

Instructor Structured Data Extraction from LLMs

Instructor is a multi-language library for extracting structured, validated data from LLM outputs. It patches LLM client libraries to return Pydantic models (Python) or Zod schemas (TypeScript) instead of raw text, supporting 15+ providers including OpenAI, Anthropic, and Google.

⭐ 12.7k instructor
Custom Agents Data Extraction & Transformation
2w ago 👁 5 View skill →
Data Extraction & Transformation Security Reviewed

Evidence BI-as-Code SQL and Markdown Analytics Framework

Evidence is an open-source framework for building data products with SQL and Markdown. It generates interactive dashboards and reports as static sites, providing a code-driven alternative to drag-and-drop BI tools with version control and reproducibility built in.

⭐ 6.1k evidence
Claude Code Data Extraction & Transformation
2w ago 👁 3 View skill →
Data Extraction & Transformation Security Reviewed

Documind AI-Powered Structured Data Extraction from Documents

Documind is an open-source Node.js tool that uses AI to extract structured JSON data from PDFs and other documents. Define a custom schema for what you need, and Documind returns clean, typed data — supporting OpenAI and local LLM backends like Llama 3.2 Vision.

⭐ 1.5k documind ⬇ 14/wk
Custom Agents Data Extraction & Transformation
2w ago 👁 5 View skill →
Data Extraction & Transformation Security Reviewed

sqlite-utils Python CLI for SQLite Database Manipulation

sqlite-utils is a Python CLI utility and library by Simon Willison for manipulating SQLite databases. It lets you pipe JSON, CSV, or TSV data directly into SQLite, run in-memory SQL queries against files, configure full-text search, and perform schema transformations — all from the command line.

⭐ 2k sqlite-utils
Custom Agents Data Extraction & Transformation
2w ago 👁 3 View skill →
Data Extraction & Transformation Security Reviewed

Anyquery Universal SQL Engine with MCP Integration

Anyquery is a SQL query engine that lets you run SQL against 40+ apps, files, and databases including GitHub, Notion, Chrome, and Apple Notes. Built on SQLite with MCP server support for connecting AI agents to structured data across services.

⭐ 1.7k anyquery
MCP Data Extraction & Transformation
2w ago 👁 5 View skill →
Data Extraction & Transformation Security Reviewed

Steampipe Zero-ETL SQL Cloud API Query Engine

Query cloud APIs, SaaS services, and infrastructure with standard SQL using Steampipe. Maps over 150 data sources (AWS, Azure, GCP, GitHub, Slack, and more) to PostgreSQL tables — no ETL pipelines needed.

⭐ 7.7k steampipe
Claude Code Data Extraction & Transformation
2w ago 👁 3 View skill →
Data Extraction & Transformation Security Reviewed

MinerU PDF-to-Markdown Document Parser

Transforms complex PDFs into LLM-ready markdown and JSON using MinerU, a high-accuracy document intelligence pipeline. Extracts text, tables, formulas, and images from scientific papers, reports, and scanned documents with layout-aware parsing.

⭐ 57.8k mineru
Claude Code Data Extraction & Transformation
2w ago 👁 4 View skill →
Data Extraction & Transformation Security Reviewed

dlt Python Data Load Tool

An open-source Python library that makes loading data from APIs, databases, and files into structured datasets simple and Pythonic. dlt automates schema inference, incremental loading, and normalization, supporting destinations like DuckDB, BigQuery, Snowflake, and Postgres.

⭐ 5.2k dlt
Custom Agents Data Extraction & Transformation
2w ago 👁 4 View skill →
Data Extraction & Transformation Security Reviewed

Meltano Declarative ELT Data Integration Engine

Meltano is an open-source, CLI-first ELT platform built on the Singer specification. It provides declarative, code-first data integration with 600+ connectors through Singer taps and targets, orchestrating data movement from APIs, databases, and files to warehouses and lakes.

⭐ 2.4k meltano
Custom Agents Data Extraction & Transformation
2w ago 👁 4 View skill →
Data Extraction & Transformation Security Reviewed

Docling AI Document Intelligence Pipeline

Docling is an IBM-backed open-source toolkit that converts PDF, DOCX, PPTX, XLSX, HTML, images, audio, and LaTeX files into structured formats for gen AI workflows. It features advanced PDF layout understanding, OCR, table extraction, and integrations with LangChain, LlamaIndex, and CrewAI.

⭐ 56.9k docling
Claude Code Data Extraction & Transformation
2w ago 👁 3 View skill →
Data Extraction & Transformation Security Reviewed

Marker PDF-to-Markdown Converter

Marker converts PDF, DOCX, PPTX, and image files to clean Markdown, JSON, and HTML with high accuracy. It handles tables, equations, code blocks, and multi-column layouts, with optional LLM-boosted extraction for maximum fidelity.

⭐ 33.2k marker
Claude Code Data Extraction & Transformation
2w ago 👁 6 View skill →
Data Extraction & Transformation Security Reviewed

dbt MCP Server for Data Pipeline Context

The official dbt MCP Server by dbt Labs provides Model Context Protocol tools for AI agents to interact with dbt projects, query the Semantic Layer, execute SQL, generate SQL from natural language, and explore data model lineage across dbt Core, Fusion, and Platform environments.

⭐ 526 dbt-mcp
MCP Data Extraction & Transformation
2w ago 👁 4 View skill →
Data Extraction & Transformation Security Reviewed

Dasel Multi-Format Data Selector and Modifier

Dasel (Data-Select) is a command-line tool and Go library for querying, modifying, and transforming structured data across JSON, YAML, TOML, XML, CSV, HCL, and INI formats using a unified selector syntax. It supports format conversion between all supported types.

⭐ 7.9k dasel
Claude Code Data Extraction & Transformation
2w ago 👁 3 View skill →
Data Extraction & Transformation Security Reviewed

xsv High-Performance CSV Toolkit

xsv is a fast CSV command-line toolkit written in Rust by Andrew Gallant (BurntSushi). It provides indexing, slicing, analyzing, splitting, joining, searching, sampling, and statistics operations on CSV files with exceptional speed and memory efficiency.

⭐ 10.8k xsv
Claude Code Data Extraction & Transformation
2w ago 👁 2 View skill →
Data Extraction & Transformation Security Reviewed

Miller CSV TSV JSON Data Processor

Miller (mlr) is a command-line tool for querying, shaping, and reformatting name-indexed data such as CSV, TSV, JSON, and JSON Lines. It combines the functionality of awk, sed, cut, join, and sort into a single tool purpose-built for structured data processing.

⭐ 9.8k miller
Claude Code Data Extraction & Transformation
2w ago 👁 4 View skill →
Data Extraction & Transformation Security Reviewed

yq YAML and Structured Data Processor

Process, query, and transform YAML, JSON, XML, CSV, TOML, and properties files from the command line using yq. Supports jq-like expressions for reading, updating, and converting between formats.

⭐ 15.1k yq
Claude Code Data Extraction & Transformation
2w ago 👁 3 View skill →
Data Extraction & Transformation Security Reviewed

Ingestr Cross-Database Data Copier

Copy data between any databases with a single CLI command using Ingestr. Supports 50+ sources and destinations including PostgreSQL, MySQL, BigQuery, Snowflake, DuckDB, MongoDB, and S3.

⭐ 3.4k notion
Claude CodeOpenClaw Data Extraction & Transformation
2w ago 👁 3 View skill →
Data Extraction & Transformation Security Reviewed

Dagster Data Pipeline Orchestrator

Orchestrate data pipelines using Dagster, the cloud-native data orchestration platform. Define data assets as Python functions with automatic lineage tracking, scheduling, and observability.

⭐ 15.3k docker
Claude CodeOpenClaw Data Extraction & Transformation
2w ago 👁 3 View skill →
Data Extraction & Transformation Security Reviewed

DuckDB SQL Analytics Agent

Run analytical SQL queries on local files (CSV, Parquet, JSON) using the DuckDB in-process database engine. Enables fast OLAP-style analysis without a server, directly from flat files on disk.

⭐ 37.1k parquet
Claude CodeOpenClaw Data Extraction & Transformation
2w ago 👁 3 View skill →
Data Extraction & Transformation Security Reviewed

Snowflake Query Optimizer Agent

Analyzes and optimizes Snowflake SQL queries using the QUERY_HISTORY and ACCESS_HISTORY views in ACCOUNT_USAGE. Identifies expensive scans, recommends clustering keys, and monitors warehouse credit consumption via WAREHOUSE_METERING_HISTORY.

OpenClaw Data Extraction & Transformation
3w ago 👁 6 View skill →