Marketplace category archive

Data Extraction & Transformation Skills

Explore live Data Extraction & Transformation skills across the current marketplace catalog.

142live listings
10frameworks in use
Livetaxonomy archive

Category Skills

Browse the published marketplace skills currently assigned to this category.

Data Extraction & Transformation Security Reviewed

Pandas DataFrame Pipeline Orchestrator

Chains pandas DataFrame transformations into reproducible pipelines using pipe() method composition. Handles missing data imputation with fillna() strategies, type coercion with astype(), and memory optimization via category dtypes.

⭐ 48.5k pandas BSD-3-Clause
ChatGPT Agents Data Extraction & Transformation
3w ago 👁 4 View skill →
Data Extraction & Transformation Security Reviewed

Salesforce Bulk API Data Loader

Performs high-volume data operations using the Salesforce Bulk API 2.0. Creates ingest jobs with createJob(), uploads CSV batches, and monitors job status via getJobInfo() for millions of records.

⭐ 1.5k salesforce ⬇ 936.6k/wk MIT
Claude Code Data Extraction & Transformation
3w ago 👁 10 View skill →
Data Extraction & Transformation Security Reviewed

Apache Kafka Stream Transformer

Processes real-time event streams using KafkaJS consumer groups and transforms messages with configurable schemas. Handles partition rebalancing, offset commits, and dead-letter queue routing for failed transformations.

Cursor Data Extraction & Transformation
3w ago 👁 4 View skill →
Data Extraction & Transformation Security Reviewed

Apache Kafka Schema Registry Extractor

Extracts and transforms Avro/Protobuf schemas from Confluent Schema Registry using the REST API. Generates TypeScript interfaces, JSON Schema, and data contract documentation from registered subjects.

⭐ 4k kafka ⬇ 2.5M/wk MIT ⚠ unmaintained
Claude Agents Data Extraction & Transformation
3w ago 👁 3 View skill →
Data Extraction & Transformation Security Reviewed

Parquet Column Statistics Profiler

Profiles Apache Parquet files using pyarrow metadata APIs to extract column statistics, row group distributions, and encoding efficiency metrics without reading full datasets.

⭐ 387 parquet ⬇ 170.7k/wk MIT ⚠ unmaintained
ChatGPT Agents Data Extraction & Transformation
3w ago 👁 3 View skill →
Data Extraction & Transformation Security Reviewed

dbt Model Lineage Mapper

Parses dbt manifest.json and catalog.json to extract full model lineage graphs using the dbt Core artifact API. Generates interactive DAG visualizations and impact analysis for model changes.

⭐ 12.6k dbt NOASSERTION
OpenClaw Data Extraction & Transformation
3w ago 👁 3 View skill →
Data Extraction & Transformation Security Reviewed

Pandas DataFrame Schema Enforcer

Validates and transforms Pandas DataFrames using Pandera schema definitions with column-level dtype, nullable, and custom check constraints. Auto-generates Pandera schema code from sample DataFrames.

⭐ 48.5k pandas BSD-3-Clause
ChatGPT Agents Data Extraction & Transformation
3w ago 👁 2 View skill →
Data Extraction & Transformation Security Reviewed

jq JSON Stream Transformer

Constructs complex jq filter expressions for transforming JSON/NDJSON streams, including recursive descent, object construction, and reduce operations. Handles multi-gigabyte streams with jq's streaming parser.

MCP Data Extraction & Transformation
3w ago 👁 3 View skill →
Data Extraction & Transformation Security Reviewed

dbt Model Transformation Architect

Generates and validates dbt (data build tool) models, tests, and documentation for Snowflake, BigQuery, and Redshift. Parses dbt manifest.json to analyze DAG lineage and detect circular dependencies.

⭐ 12.6k dbt NOASSERTION
Claude Agents Data Extraction & Transformation
3w ago 👁 3 View skill →
Data Extraction & Transformation Security Reviewed

Apache Tika Document Parser

Extracts structured text, metadata, and embedded objects from PDFs, Office documents, and 1000+ file formats using the Apache Tika REST API. Outputs clean Markdown or JSON with XMP metadata preservation.

Gemini Data Extraction & Transformation
3w ago 👁 4 View skill →
Data Extraction & Transformation Security Reviewed

Camelot Advanced PDF Table Intelligence

Intelligent PDF table detection and extraction using Camelot-py with OpenCV-based lattice detection and morphological transformations. Handles rotated tables, merged cells, and multi-page spanning tables.

Cursor Data Extraction & Transformation
3w ago 👁 2 View skill →
Data Extraction & Transformation Security Reviewed

Cheerio DOM Extraction Pipeline

Builds configurable data extraction pipelines using Cheerio for server-side DOM parsing with CSS selector chains. Supports JSONPath output mapping, pagination following, and schema validation via Ajv.

⭐ 30.3k cheerio ⬇ 19.6M/wk MIT
Codex Data Extraction & Transformation
3w ago 👁 2 View skill →
Data Extraction & Transformation Security Reviewed

Apache Tika Document Parser Agent

Extracts text and metadata from 1000+ file formats using Apache Tika server REST API. Handles PDF OCR via Tesseract integration, Office document parsing, and email archive extraction with MIME detection.

Gemini Data Extraction & Transformation
3w ago 👁 4 View skill →
Data Extraction & Transformation Security Reviewed

Tabula PDF Table Extractor

Extracts structured tables from PDF documents using Tabula-java with lattice and stream detection modes. Outputs to CSV, JSON, or pandas DataFrames with automatic column type inference via python-tabula.

MCP Data Extraction & Transformation
3w ago 👁 2 View skill →
Data Extraction & Transformation Security Reviewed

dbt Model Dependency Analyzer

Analyzes dbt project DAGs using the dbt manifest.json artifact and the dbt Cloud API. Detects circular dependencies, orphaned models, and generates lineage visualizations.

⭐ 12.6k dbt NOASSERTION
Claude Code Data Extraction & Transformation
3w ago 👁 3 View skill →
Data Extraction & Transformation Security Reviewed

Pandas DataFrame Pipeline Builder

Constructs data transformation pipelines using Pandas and the pipe() method chain pattern. Integrates with SQLAlchemy for database I/O and PyArrow for high-performance Parquet operations.

⭐ 48.5k pandas BSD-3-Clause
Cursor Data Extraction & Transformation
3w ago 👁 3 View skill →
Data Extraction & Transformation Security Reviewed

GraphQL Data Federation Agent

Federates data from multiple GraphQL and REST APIs using Apollo Federation gateway. Implements schema stitching with automatic type merging and the DataLoader pattern for N+1 prevention.

⭐ 20.3k graphql ⬇ 34.2M/wk MIT
OpenClaw Data Extraction & Transformation
3w ago 👁 2 View skill →
Data Extraction & Transformation Security Reviewed

Camelot PDF Stream Parser

Implements Camelot library for advanced PDF table detection using lattice and stream parsing algorithms. Processes complex multi-page documents with OpenCV-based edge detection and outputs normalized DataFrames with cell-level confidence scores.

Claude Agents Data Extraction & Transformation
3w ago 👁 2 View skill →
Data Extraction & Transformation Security Reviewed

dbt Data Transform Orchestrator

Integrates with dbt Cloud Administrative API v2 to trigger and monitor data transformation jobs. Manages model runs, source freshness checks, and test execution through dbt API endpoints with Snowflake and BigQuery adapter support.

⭐ 12.6k dbt NOASSERTION
Custom Agents Data Extraction & Transformation
3w ago 👁 3 View skill →
Data Extraction & Transformation Security Reviewed

Apache Tika Document Extractor

Wraps Apache Tika Server REST API for extracting structured text from PDFs, DOCX, PPTX, and 1,200+ file formats. Outputs clean markdown with metadata preservation using Tika /rmeta/text endpoint and recursive parsing mode.

Codex Data Extraction & Transformation
3w ago 👁 5 View skill →
Data Extraction & Transformation Security Reviewed

Tabula PDF Table Extraction Agent

Uses Tabula Java library via tabula-py bindings to detect and extract tables from PDF documents. Supports both lattice and stream extraction modes with configurable area coordinates and outputs to pandas DataFrames or CSV.

MCP Data Extraction & Transformation
3w ago 👁 4 View skill →
Data Extraction & Transformation Security Reviewed

dbt Model Dependency Resolver

Analyzes dbt project DAGs to identify circular references, orphaned models, and suboptimal materialization strategies. Uses dbt-core manifest.json parsing with Jinja template resolution for accurate lineage tracking.

⭐ 12.6k dbt NOASSERTION
Claude CodeMulti-Framework Data Extraction & Transformation
3w ago 👁 3 View skill →
Data Extraction & Transformation Security Reviewed

Pandas Profiling Report Generator

Generates comprehensive data quality reports using ydata-profiling (formerly pandas-profiling) with correlation analysis, missing value patterns, and cardinality detection. Exports interactive HTML dashboards and JSON summaries.

⭐ 48.5k pandas BSD-3-Clause
Codex Data Extraction & Transformation
3w ago 👁 3 View skill →
Data Extraction & Transformation Security Reviewed

Apache Spark DataFrame ETL Pipeline

Automates PySpark DataFrame transformations including schema inference, partition pruning, and Delta Lake merge operations. Integrates with AWS Glue Data Catalog and Apache Iceberg table formats for lakehouse architectures.

⭐ 43.1k spark Apache-2.0
OpenClaw Data Extraction & Transformation
3w ago 👁 4 View skill →