Skill Detail

Parquet Column Mapper

Reads and transforms Apache Parquet file metadata and column statistics using PyArrow and the Parquet Thrift specification. Maps column types across Delta Lake, Iceberg, and Hudi table formats.

Data Extraction & TransformationMCP

Data Extraction & Transformation MCP Security Reviewed

Tool match: parquet ⭐ 387 GitHub stars ⬇ 170.7k/wk npm MIT license ⚠ Repository looks unmaintained

INSTALL WITH ANY AGENT

npx skills add agentskillexchange/skills --skill parquet-column-mapper Copy

Works best when you want a reusable capability, not another fragile one-off prompt.

View source

At a glance

Last updated

Mar 24, 2026

Quick brief

The Parquet Column Mapper skill uses the PyArrow library to read Apache Parquet file footers, row group metadata, and column chunk statistics without loading full datasets into memory. It extracts min/max values, null counts, distinct counts, and encoding information for each column chunk across row groups. The skill maps Parquet logical and physical types to equivalent types in Delta Lake, Apache Iceberg, and Apache Hudi table formats, identifying type compatibility issues during format migration. Features include partition statistics aggregation from directory-structured datasets, predicate pushdown simulation for query planning analysis, and bloom filter metadata extraction for point lookup optimization. Supports bulk schema comparison across Parquet files in S3, GCS, and ADLS using fsspec-compatible file systems.

Best fit

When to reach for it

Best when the job fits Data Extraction & Transformation.
Works naturally with MCP setups.

Trust & provenance

Why this listing is credible

Built around the parquet toolchain.
Trust status: Security Reviewed.
387 GitHub stars on the linked upstream source.
170.7k/week npm downloads recorded.
License: MIT.
Last updated Mar 24, 2026.

View source ↗