Skill Detail

Scrapy Pipeline Manager

Manages Scrapy spider deployments via Scrapyd API with custom item pipelines for MongoDB ingestion, deduplication via MinHash LSH, and rotating proxy middleware configuration.

Research & ScrapingClaude Code

Research & Scraping Claude Code Security Reviewed

Tool match: scrapy ⭐ 61.3k GitHub stars BSD-3-Clause license

INSTALL WITH ANY AGENT

npx skills add agentskillexchange/skills --skill scrapy-pipeline-manager Copy

Works best when you want a reusable capability, not another fragile one-off prompt.

View source

At a glance

Author

scrapy

Last updated

Mar 24, 2026

Quick brief

The Scrapy Pipeline Manager skill orchestrates Scrapy spider deployments through the Scrapyd HTTP API. It handles egg packaging, project deployment, spider scheduling, and log retrieval across multiple Scrapyd nodes for distributed crawling.

How it works

What this skill actually does

Custom item pipelines are configured for downstream data processing including MongoDB ingestion via PyMongo with automatic collection sharding, Elasticsearch indexing via the bulk API, and file download pipelines for media assets. Deduplication uses MinHash LSH (Locality Sensitive Hashing) via the datasketch library for near-duplicate detection across crawl runs.

The middleware stack includes rotating proxy support via scrapy-rotating-proxies with dead proxy detection, custom retry middleware with exponential backoff, and AutoThrottle configuration for polite crawling. The skill manages robots.txt compliance, generates crawl statistics dashboards, and supports Splash integration for JavaScript rendering through the scrapy-splash middleware.

Best fit

When to reach for it

Best when the job fits Research & Scraping.
Works naturally with Claude Code setups.

Trust & provenance

Why this listing is credible

Built around the scrapy toolchain.
Trust status: Security Reviewed.
61.3k GitHub stars on the linked upstream source.
License: BSD-3-Clause.
Last updated Mar 24, 2026.

View source ↗