Data Extraction & Transformation
Security Reviewed
Chains pandas DataFrame transformations into reproducible pipelines using pipe() method composition. Handles missing data imputation with fillna() strategies, type coercion with astype(), and memory optimization via category dtypes.
⭐ 48.5k pandas BSD-3-Clause
ChatGPT Agents Data Extraction & Transformation
Data Extraction & Transformation
Security Reviewed
Performs high-volume data operations using the Salesforce Bulk API 2.0. Creates ingest jobs with createJob(), uploads CSV batches, and monitors job status via getJobInfo() for millions of records.
⭐ 1.5k salesforce ⬇ 936.6k/wk MIT
Claude Code Data Extraction & Transformation
Data Extraction & Transformation
Security Reviewed
Processes real-time event streams using KafkaJS consumer groups and transforms messages with configurable schemas. Handles partition rebalancing, offset commits, and dead-letter queue routing for failed transformations.
Cursor Data Extraction & Transformation
Data Extraction & Transformation
Security Reviewed
Extracts and transforms Avro/Protobuf schemas from Confluent Schema Registry using the REST API. Generates TypeScript interfaces, JSON Schema, and data contract documentation from registered subjects.
⭐ 4k kafka ⬇ 2.5M/wk MIT ⚠ unmaintained
Claude Agents Data Extraction & Transformation
Data Extraction & Transformation
Security Reviewed
Profiles Apache Parquet files using pyarrow metadata APIs to extract column statistics, row group distributions, and encoding efficiency metrics without reading full datasets.
⭐ 387 parquet ⬇ 170.7k/wk MIT ⚠ unmaintained
ChatGPT Agents Data Extraction & Transformation
Data Extraction & Transformation
Security Reviewed
Parses dbt manifest.json and catalog.json to extract full model lineage graphs using the dbt Core artifact API. Generates interactive DAG visualizations and impact analysis for model changes.
⭐ 12.6k dbt NOASSERTION
OpenClaw Data Extraction & Transformation
Data Extraction & Transformation
Security Reviewed
Validates and transforms Pandas DataFrames using Pandera schema definitions with column-level dtype, nullable, and custom check constraints. Auto-generates Pandera schema code from sample DataFrames.
⭐ 48.5k pandas BSD-3-Clause
ChatGPT Agents Data Extraction & Transformation
Data Extraction & Transformation
Security Reviewed
Constructs complex jq filter expressions for transforming JSON/NDJSON streams, including recursive descent, object construction, and reduce operations. Handles multi-gigabyte streams with jq's streaming parser.
MCP Data Extraction & Transformation
Data Extraction & Transformation
Security Reviewed
Generates and validates dbt (data build tool) models, tests, and documentation for Snowflake, BigQuery, and Redshift. Parses dbt manifest.json to analyze DAG lineage and detect circular dependencies.
⭐ 12.6k dbt NOASSERTION
Claude Agents Data Extraction & Transformation
Data Extraction & Transformation
Security Reviewed
Extracts structured text, metadata, and embedded objects from PDFs, Office documents, and 1000+ file formats using the Apache Tika REST API. Outputs clean Markdown or JSON with XMP metadata preservation.
Gemini Data Extraction & Transformation
Data Extraction & Transformation
Security Reviewed
Intelligent PDF table detection and extraction using Camelot-py with OpenCV-based lattice detection and morphological transformations. Handles rotated tables, merged cells, and multi-page spanning tables.
Cursor Data Extraction & Transformation
Data Extraction & Transformation
Security Reviewed
Builds configurable data extraction pipelines using Cheerio for server-side DOM parsing with CSS selector chains. Supports JSONPath output mapping, pagination following, and schema validation via Ajv.
⭐ 30.3k cheerio ⬇ 19.6M/wk MIT
Codex Data Extraction & Transformation
Data Extraction & Transformation
Security Reviewed
Extracts text and metadata from 1000+ file formats using Apache Tika server REST API. Handles PDF OCR via Tesseract integration, Office document parsing, and email archive extraction with MIME detection.
Gemini Data Extraction & Transformation
Data Extraction & Transformation
Security Reviewed
Extracts structured tables from PDF documents using Tabula-java with lattice and stream detection modes. Outputs to CSV, JSON, or pandas DataFrames with automatic column type inference via python-tabula.
MCP Data Extraction & Transformation
Data Extraction & Transformation
Security Reviewed
Analyzes dbt project DAGs using the dbt manifest.json artifact and the dbt Cloud API. Detects circular dependencies, orphaned models, and generates lineage visualizations.
⭐ 12.6k dbt NOASSERTION
Claude Code Data Extraction & Transformation
Data Extraction & Transformation
Security Reviewed
Constructs data transformation pipelines using Pandas and the pipe() method chain pattern. Integrates with SQLAlchemy for database I/O and PyArrow for high-performance Parquet operations.
⭐ 48.5k pandas BSD-3-Clause
Cursor Data Extraction & Transformation
Data Extraction & Transformation
Security Reviewed
Federates data from multiple GraphQL and REST APIs using Apollo Federation gateway. Implements schema stitching with automatic type merging and the DataLoader pattern for N+1 prevention.
⭐ 20.3k graphql ⬇ 34.2M/wk MIT
OpenClaw Data Extraction & Transformation
Data Extraction & Transformation
Security Reviewed
Implements Camelot library for advanced PDF table detection using lattice and stream parsing algorithms. Processes complex multi-page documents with OpenCV-based edge detection and outputs normalized DataFrames with cell-level confidence scores.
Claude Agents Data Extraction & Transformation
Data Extraction & Transformation
Security Reviewed
Integrates with dbt Cloud Administrative API v2 to trigger and monitor data transformation jobs. Manages model runs, source freshness checks, and test execution through dbt API endpoints with Snowflake and BigQuery adapter support.
⭐ 12.6k dbt NOASSERTION
Custom Agents Data Extraction & Transformation
Data Extraction & Transformation
Security Reviewed
Wraps Apache Tika Server REST API for extracting structured text from PDFs, DOCX, PPTX, and 1,200+ file formats. Outputs clean markdown with metadata preservation using Tika /rmeta/text endpoint and recursive parsing mode.
Codex Data Extraction & Transformation
Data Extraction & Transformation
Security Reviewed
Uses Tabula Java library via tabula-py bindings to detect and extract tables from PDF documents. Supports both lattice and stream extraction modes with configurable area coordinates and outputs to pandas DataFrames or CSV.
MCP Data Extraction & Transformation
Data Extraction & Transformation
Security Reviewed
Analyzes dbt project DAGs to identify circular references, orphaned models, and suboptimal materialization strategies. Uses dbt-core manifest.json parsing with Jinja template resolution for accurate lineage tracking.
⭐ 12.6k dbt NOASSERTION
Claude CodeMulti-Framework Data Extraction & Transformation
Data Extraction & Transformation
Security Reviewed
Generates comprehensive data quality reports using ydata-profiling (formerly pandas-profiling) with correlation analysis, missing value patterns, and cardinality detection. Exports interactive HTML dashboards and JSON summaries.
⭐ 48.5k pandas BSD-3-Clause
Codex Data Extraction & Transformation
Data Extraction & Transformation
Security Reviewed
Automates PySpark DataFrame transformations including schema inference, partition pruning, and Delta Lake merge operations. Integrates with AWS Glue Data Catalog and Apache Iceberg table formats for lakehouse architectures.
⭐ 43.1k spark Apache-2.0
OpenClaw Data Extraction & Transformation