Marketplace category archive

Data Extraction & Transformation Skills

Explore live Data Extraction & Transformation skills across the current marketplace catalog.

142live listings
10frameworks in use
Livetaxonomy archive

Category Skills

Browse the published marketplace skills currently assigned to this category.

Data Extraction & Transformation Security Reviewed

Tesseract OCR Document Extractor

Extracts structured text from scanned documents and images using Tesseract OCR with custom LSTM training data. Supports table detection via OpenCV contour analysis and PDF/A output generation.

ChatGPT Agents Data Extraction & Transformation
3w ago 👁 6 View skill →
Data Extraction & Transformation Security Reviewed

PDF Table Extraction with Camelot

Extracts structured tables from PDF documents using Camelot's lattice and stream parsing algorithms. Handles merged cells, multi-page tables, and outputs to pandas DataFrames, CSV, and JSON with column type inference.

Claude Code Data Extraction & Transformation
3w ago 👁 3 View skill →
Data Extraction & Transformation Security Reviewed

Apache Kafka Schema Registry Validator

Validates Avro, Protobuf, and JSON Schema compatibility against Confluent Schema Registry using the REST API. Enforces backward/forward/full compatibility modes and detects breaking schema evolution changes.

Claude Agents Data Extraction & Transformation
3w ago 👁 3 View skill →
Data Extraction & Transformation Security Reviewed

GraphQL Schema Introspection Mapper

Introspects GraphQL APIs via the __schema query and maps type relationships, deprecated fields, and circular references. Generates SDL exports, TypeScript types via graphql-codegen, and schema diff reports between versions.

CursorMulti-Framework Data Extraction & Transformation
3w ago 👁 3 View skill →
Data Extraction & Transformation Security Reviewed

XML XSLT Transform Pipeline

Applies chained XSLT transformations to XML documents using lxml and Saxon-JS. Supports XSLT 3.0 streaming, XPath parameter injection, and multi-stage transform pipelines with intermediate validation.

ChatGPT Agents Data Extraction & Transformation
3w ago 👁 3 View skill →
Data Extraction & Transformation Security Reviewed

Protocol Buffer Schema Generator

Infers Protocol Buffer (.proto) definitions from JSON samples using protobuf-compiler and grpcio-tools. Generates proto3 schemas with nested message types, enums, and gRPC service stubs.

Gemini Data Extraction & Transformation
3w ago 👁 1 View skill →
Data Extraction & Transformation Security Reviewed

CSV Schema Validator & Auto-Fixer

Validates CSV files against JSON Schema definitions using AJV and csv-parse. Automatically detects and repairs type mismatches, missing required columns, and encoding issues with configurable strictness levels.

OpenClaw Data Extraction & Transformation
3w ago 👁 3 View skill →
Data Extraction & Transformation Published

Parquet to PostgreSQL Loader

Reads Apache Parquet files using PyArrow and bulk-loads them into PostgreSQL via psycopg2 COPY protocol. Handles schema mapping, partitioned datasets, and incremental upserts with conflict resolution.

Claude Agents Data Extraction & Transformation
3w ago 👁 5 View skill →
Data Extraction & Transformation Security Reviewed

Snowflake Query Profiler

Profiles and optimizes Snowflake SQL queries using the Snowflake Information Schema and Query History views. Identifies warehouse sizing issues, scanning inefficiencies, and recommends clustering keys.

ChatGPT Agents Data Extraction & Transformation
3w ago 👁 2 View skill →
Data Extraction & Transformation Published

Parquet Column Mapper

Reads and transforms Apache Parquet file metadata and column statistics using PyArrow and the Parquet Thrift specification. Maps column types across Delta Lake, Iceberg, and Hudi table formats.

MCP Data Extraction & Transformation
3w ago 👁 4 View skill →
Data Extraction & Transformation Security Reviewed

dbt Model Lineage Analyzer

Parses dbt project manifests and catalog artifacts to build complete data lineage graphs. Uses the dbt Cloud API v2 for run metadata and the dbt Core manifest.json for model dependency analysis.

Claude Agents Data Extraction & Transformation
3w ago 👁 3 View skill →
Data Extraction & Transformation Security Reviewed

Apache Kafka Schema Extractor

Extracts and transforms Avro, Protobuf, and JSON Schema definitions from Confluent Schema Registry. Generates typed data models and validates schema compatibility using the Schema Registry REST API.

Codex Data Extraction & Transformation
3w ago 👁 3 View skill →
Data Extraction & Transformation Security Reviewed

dbt Model Lineage Extractor

Extracts and visualizes dbt model dependency lineage using the dbt Core manifest.json artifact API. Parses ref() and source() calls to build DAG representations via the dbt Cloud API v2.

Claude Code Data Extraction & Transformation
3w ago 👁 4 View skill →
Data Extraction & Transformation Security Reviewed

Pandas DataFrame Schema Validator

Validates Pandas DataFrame structures using the pandera library API and pa.DataFrameSchema definitions. Enforces column types, nullable constraints, and custom check functions via pandera.Check.

OpenClaw Data Extraction & Transformation
3w ago 👁 2 View skill →
Data Extraction & Transformation Security Reviewed

Airbyte Connector Config Generator

Generates Airbyte source and destination connector configurations using the Airbyte API /v1/sources/create and /v1/destinations/create endpoints. Validates connection specs against the Airbyte Protocol.

Cursor Data Extraction & Transformation
3w ago 👁 6 View skill →
Data Extraction & Transformation Security Reviewed

Apache Camel Route Data Mapper

Maps and transforms data between systems using Apache Camel route definitions and the Camel Component API. Supports XSLT, JSONPath, and DataFormat transformations via camel-core SDK.

Custom Agents Data Extraction & Transformation
3w ago 👁 4 View skill →
Data Extraction & Transformation Security Reviewed

Parquet Column Pruning Optimizer

Optimizes Apache Parquet file reads using PyArrow column pruning and predicate pushdown. Analyzes query patterns to recommend row group sizing and dictionary encoding strategies.

Custom Agents Data Extraction & Transformation
3w ago 👁 4 View skill →
Data Extraction & Transformation Security Reviewed

jq Pipeline Builder Agent

Constructs complex jq filter pipelines from natural language queries against JSON/NDJSON data streams. Uses jq built-in functions including path expressions, reduce, and SQL-style operators.

Gemini Data Extraction & Transformation
3w ago 👁 4 View skill →
Data Extraction & Transformation Security Reviewed

Apache Avro Schema Evolution Agent

Manages Apache Avro schema evolution with compatibility checking via Confluent Schema Registry API. Validates forward, backward, and full compatibility across schema versions automatically.

MCP Data Extraction & Transformation
3w ago 👁 7 View skill →
Data Extraction & Transformation Security Reviewed

Apache Tika Content Extraction Hub

Extracts text and metadata from 1400+ file formats via Apache Tika Server REST API. Handles PDF, DOCX, PPTX, email archives, and embedded document extraction with MIME type detection.

Custom Agents Data Extraction & Transformation
3w ago 👁 5 View skill →
Data Extraction & Transformation Security Reviewed

Tesseract OCR Data Extractor

Extracts structured data from scanned documents using Tesseract OCR engine with LSTM models. Supports table detection via OpenCV contour analysis and outputs to CSV, JSON, or Pandas DataFrames.

Gemini Data Extraction & Transformation
3w ago 👁 5 View skill →
Data Extraction & Transformation Security Reviewed

dbt Data Transformation Orchestrator

Manages dbt Core and dbt Cloud API workflows for SQL-based data transformations. Handles model lineage, incremental builds, and data quality tests with Great Expectations integration.

Claude Code Data Extraction & Transformation
3w ago 👁 1 View skill →
Data Extraction & Transformation Security Reviewed

Scrapy Spider Data Pipeline

Builds and manages Scrapy web scraping spiders with custom item pipelines. Supports Splash rendering for JavaScript pages, rotating proxies via scrapy-rotating-proxies, and export to MongoDB or Elasticsearch.

Cursor Data Extraction & Transformation
3w ago 👁 2 View skill →
Data Extraction & Transformation Security Reviewed

JSON-to-Avro Schema Transformer

Transforms JSON Schema definitions into Apache Avro schema format using the jsonschema and fastavro Python libraries. Handles nested objects, arrays, optional fields, and $ref resolution. Registers the resulting Avro schema to Confluent Schema Registry via the Schema Registry REST API.

OpenClaw Data Extraction & Transformation
3w ago 👁 2 View skill →