Skill Detail

Parquet Column Pruning Optimizer

Optimizes Apache Parquet file reads using PyArrow column pruning and predicate pushdown. Analyzes query patterns to recommend row group sizing and dictionary encoding strategies.

Data Extraction & TransformationCustom Agents

Data Extraction & Transformation Custom Agents Security Reviewed

Tool match: parquet ⭐ 387 GitHub stars ⬇ 170.7k/wk npm MIT license ⚠ Repository looks unmaintained

INSTALL WITH ANY AGENT

npx skills add agentskillexchange/skills --skill parquet-column-pruning-optimizer Copy

Works best when you want a reusable capability, not another fragile one-off prompt.

View source

At a glance

Last updated

Mar 24, 2026

Quick brief

The Parquet Column Pruning Optimizer reduces data lake query costs by analyzing SQL query patterns against Parquet file metadata to recommend optimal column projection and predicate pushdown strategies. Using PyArrow and the Parquet file footer metadata reader, it maps query column references to Parquet column indices, calculates potential I/O savings from column pruning, and identifies row groups that can be skipped via min/max statistics pushdown. The agent analyzes column cardinality and value distributions to recommend dictionary encoding vs plain encoding per column, optimal row group sizes based on query selectivity patterns, and page-level bloom filter configuration for high-cardinality columns. It supports Parquet files on local disk, S3, GCS, and HDFS via fsspec filesystem abstraction, handles nested schema types (structs, lists, maps), and generates optimized PyArrow read configurations with filters and columns parameters. Includes benchmark reporting comparing optimized vs naive read performance.

Best fit

When to reach for it

Best when the job fits Data Extraction & Transformation.
Works naturally with Custom Agents setups.

Trust & provenance

Why this listing is credible

Built around the parquet toolchain.
Trust status: Security Reviewed.
387 GitHub stars on the linked upstream source.
170.7k/week npm downloads recorded.
License: MIT.
Last updated Mar 24, 2026.

View source ↗