Data Extraction & Transformation
Security Reviewed
gron transforms JSON into discrete assignment statements, making it possible to grep through complex JSON structures using standard Unix tools. It reverses the process with ungron, turning filtered assignments back into valid JSON.
⭐ 14.4k gron
Custom Agents Data Extraction & Transformation
Data Extraction & Transformation
Security Reviewed
fx is a terminal-based JSON viewer and processor that provides an interactive TUI for navigating, filtering, and transforming JSON, YAML, and TOML data. Built in Go with support for JavaScript/Python expressions, it replaces piping through jq for quick data exploration.
⭐ 20.4k fx ⬇ 206k/wk
Custom Agents Data Extraction & Transformation
Data Extraction & Transformation
Security Reviewed
Polars is an extremely fast DataFrame library written in Rust with Python, Node.js, and R bindings. This skill enables agents to leverage Polars for high-performance data manipulation, transformation, and analytical queries on structured datasets.
⭐ 37.9k polars MIT
Custom Agents Data Extraction & Transformation
Data Extraction & Transformation
Security Reviewed
qsv is a high-performance command-line toolkit for querying, transforming, validating, and analyzing CSV and tabular data. Written in Rust, it provides over 50 commands for data wrangling tasks and supports Excel, Parquet, JSON, and other formats.
⭐ 3.6k qsv
Custom Agents Data Extraction & Transformation
Data Extraction & Transformation
Security Reviewed
Instructor is a multi-language library for extracting structured, validated data from LLM outputs. It patches LLM client libraries to return Pydantic models (Python) or Zod schemas (TypeScript) instead of raw text, supporting 15+ providers including OpenAI, Anthropic, and Google.
⭐ 12.7k instructor
Custom Agents Data Extraction & Transformation
Data Extraction & Transformation
Security Reviewed
Evidence is an open-source framework for building data products with SQL and Markdown. It generates interactive dashboards and reports as static sites, providing a code-driven alternative to drag-and-drop BI tools with version control and reproducibility built in.
⭐ 6.1k evidence
Claude Code Data Extraction & Transformation
Data Extraction & Transformation
Security Reviewed
Documind is an open-source Node.js tool that uses AI to extract structured JSON data from PDFs and other documents. Define a custom schema for what you need, and Documind returns clean, typed data — supporting OpenAI and local LLM backends like Llama 3.2 Vision.
⭐ 1.5k documind ⬇ 14/wk
Custom Agents Data Extraction & Transformation
Data Extraction & Transformation
Security Reviewed
sqlite-utils is a Python CLI utility and library by Simon Willison for manipulating SQLite databases. It lets you pipe JSON, CSV, or TSV data directly into SQLite, run in-memory SQL queries against files, configure full-text search, and perform schema transformations — all from the command line.
⭐ 2k sqlite-utils
Custom Agents Data Extraction & Transformation
Data Extraction & Transformation
Security Reviewed
Anyquery is a SQL query engine that lets you run SQL against 40+ apps, files, and databases including GitHub, Notion, Chrome, and Apple Notes. Built on SQLite with MCP server support for connecting AI agents to structured data across services.
⭐ 1.7k anyquery
MCP Data Extraction & Transformation
Data Extraction & Transformation
Security Reviewed
Query cloud APIs, SaaS services, and infrastructure with standard SQL using Steampipe. Maps over 150 data sources (AWS, Azure, GCP, GitHub, Slack, and more) to PostgreSQL tables — no ETL pipelines needed.
⭐ 7.7k steampipe
Claude Code Data Extraction & Transformation
Data Extraction & Transformation
Security Reviewed
Transforms complex PDFs into LLM-ready markdown and JSON using MinerU, a high-accuracy document intelligence pipeline. Extracts text, tables, formulas, and images from scientific papers, reports, and scanned documents with layout-aware parsing.
⭐ 57.8k mineru
Claude Code Data Extraction & Transformation
Data Extraction & Transformation
Security Reviewed
An open-source Python library that makes loading data from APIs, databases, and files into structured datasets simple and Pythonic. dlt automates schema inference, incremental loading, and normalization, supporting destinations like DuckDB, BigQuery, Snowflake, and Postgres.
⭐ 5.2k dlt
Custom Agents Data Extraction & Transformation
Data Extraction & Transformation
Security Reviewed
Meltano is an open-source, CLI-first ELT platform built on the Singer specification. It provides declarative, code-first data integration with 600+ connectors through Singer taps and targets, orchestrating data movement from APIs, databases, and files to warehouses and lakes.
⭐ 2.4k meltano
Custom Agents Data Extraction & Transformation
Data Extraction & Transformation
Security Reviewed
Docling is an IBM-backed open-source toolkit that converts PDF, DOCX, PPTX, XLSX, HTML, images, audio, and LaTeX files into structured formats for gen AI workflows. It features advanced PDF layout understanding, OCR, table extraction, and integrations with LangChain, LlamaIndex, and CrewAI.
⭐ 56.9k docling
Claude Code Data Extraction & Transformation
Data Extraction & Transformation
Security Reviewed
Marker converts PDF, DOCX, PPTX, and image files to clean Markdown, JSON, and HTML with high accuracy. It handles tables, equations, code blocks, and multi-column layouts, with optional LLM-boosted extraction for maximum fidelity.
⭐ 33.2k marker
Claude Code Data Extraction & Transformation
Data Extraction & Transformation
Security Reviewed
The official dbt MCP Server by dbt Labs provides Model Context Protocol tools for AI agents to interact with dbt projects, query the Semantic Layer, execute SQL, generate SQL from natural language, and explore data model lineage across dbt Core, Fusion, and Platform environments.
⭐ 526 dbt-mcp
MCP Data Extraction & Transformation
Data Extraction & Transformation
Security Reviewed
Dasel (Data-Select) is a command-line tool and Go library for querying, modifying, and transforming structured data across JSON, YAML, TOML, XML, CSV, HCL, and INI formats using a unified selector syntax. It supports format conversion between all supported types.
⭐ 7.9k dasel
Claude Code Data Extraction & Transformation
Data Extraction & Transformation
Security Reviewed
xsv is a fast CSV command-line toolkit written in Rust by Andrew Gallant (BurntSushi). It provides indexing, slicing, analyzing, splitting, joining, searching, sampling, and statistics operations on CSV files with exceptional speed and memory efficiency.
⭐ 10.8k xsv
Claude Code Data Extraction & Transformation
Data Extraction & Transformation
Security Reviewed
Miller (mlr) is a command-line tool for querying, shaping, and reformatting name-indexed data such as CSV, TSV, JSON, and JSON Lines. It combines the functionality of awk, sed, cut, join, and sort into a single tool purpose-built for structured data processing.
⭐ 9.8k miller
Claude Code Data Extraction & Transformation
Data Extraction & Transformation
Security Reviewed
Process, query, and transform YAML, JSON, XML, CSV, TOML, and properties files from the command line using yq. Supports jq-like expressions for reading, updating, and converting between formats.
⭐ 15.1k yq
Claude Code Data Extraction & Transformation
Data Extraction & Transformation
Security Reviewed
Copy data between any databases with a single CLI command using Ingestr. Supports 50+ sources and destinations including PostgreSQL, MySQL, BigQuery, Snowflake, DuckDB, MongoDB, and S3.
⭐ 3.4k notion
Claude CodeOpenClaw Data Extraction & Transformation
Data Extraction & Transformation
Security Reviewed
Orchestrate data pipelines using Dagster, the cloud-native data orchestration platform. Define data assets as Python functions with automatic lineage tracking, scheduling, and observability.
⭐ 15.3k docker
Claude CodeOpenClaw Data Extraction & Transformation
Data Extraction & Transformation
Security Reviewed
Run analytical SQL queries on local files (CSV, Parquet, JSON) using the DuckDB in-process database engine. Enables fast OLAP-style analysis without a server, directly from flat files on disk.
⭐ 37.1k parquet
Claude CodeOpenClaw Data Extraction & Transformation
Data Extraction & Transformation
Security Reviewed
Analyzes and optimizes Snowflake SQL queries using the QUERY_HISTORY and ACCESS_HISTORY views in ACCOUNT_USAGE. Identifies expensive scans, recommends clustering keys, and monitors warehouse credit consumption via WAREHOUSE_METERING_HISTORY.
OpenClaw Data Extraction & Transformation