Skill Detail

Parquet Column Statistics Profiler

Profiles Apache Parquet files using pyarrow metadata APIs to extract column statistics, row group distributions, and encoding efficiency metrics without reading full datasets.

Data Extraction & TransformationChatGPT Agents

Data Extraction & Transformation ChatGPT Agents Security Reviewed

Tool match: parquet ⭐ 387 GitHub stars ⬇ 170.7k/wk npm MIT license ⚠ Repository looks unmaintained

INSTALL WITH ANY AGENT

npx skills add agentskillexchange/skills --skill parquet-column-statistics-profiler Copy

Works best when you want a reusable capability, not another fragile one-off prompt.

View source

At a glance

Last updated

Mar 24, 2026

Quick brief

The Parquet Column Statistics Profiler skill performs lightweight metadata-only analysis of Apache Parquet files to extract comprehensive column statistics and storage efficiency metrics. Using pyarrow’s ParquetFile metadata API, it reads file footers and row group statistics without loading actual data into memory.

How it works

What this skill actually does

For each column, the skill extracts min/max values, null counts, distinct value estimates from Bloom filters, and compression ratios across encodings (PLAIN, DICTIONARY, RLE, DELTA_BINARY_PACKED). It identifies columns with poor dictionary encoding efficiency (high cardinality vs. dictionary page size) and recommends encoding strategy changes.

Row group analysis reveals data distribution skew, optimal row group sizing for your query patterns, and predicate pushdown effectiveness estimates. The skill can compare statistics across partitioned datasets to identify data quality issues like NULL spikes, value range drift, or partition skew that impacts query performance.

Output includes formatted profiling reports with histograms, a storage efficiency scorecard, and actionable recommendations for Spark/Trino query optimization. Supports Parquet files on local filesystem, S3 (via fsspec), GCS, and ADLS.

Best fit

When to reach for it

Best when the job fits Data Extraction & Transformation.
Works naturally with ChatGPT Agents setups.

Trust & provenance

Why this listing is credible

Built around the parquet toolchain.
Trust status: Security Reviewed.
387 GitHub stars on the linked upstream source.
170.7k/week npm downloads recorded.
License: MIT.
Last updated Mar 24, 2026.

View source ↗