Skill Detail

Beautiful Soup Academic Paper Parser

Extracts structured citation data from academic repositories using BeautifulSoup4 with lxml parser. Parses DOI metadata, author affiliations, and reference lists from PubMed, arXiv, and Semantic Scholar HTML.

Research & ScrapingMCP

Research & Scraping MCP Security Reviewed

Tool match: beautifulsoup MIT License license

INSTALL WITH ANY AGENT

npx skills add agentskillexchange/skills --skill beautifulsoup-academic-paper-parser Copy

Works best when you want a reusable capability, not another fragile one-off prompt.

View source

At a glance

Last updated

Mar 24, 2026

Quick brief

This skill extracts structured bibliographic data from academic paper repositories using BeautifulSoup4 with the lxml parser for fast HTML processing. It handles the unique DOM structures of major academic platforms including PubMed, arXiv abstract pages, and Semantic Scholar.

How it works

What this skill actually does

Extraction targets include paper titles, abstract text, author names with affiliations, DOI identifiers, publication dates, journal/conference names, and full reference lists with citation counts. The skill uses CSS selectors and find_all() with regex patterns to handle varying HTML structures across platforms. DOI resolution uses the CrossRef API for metadata enrichment.

Output formats include BibTeX, RIS, and structured JSON following the CSL-JSON schema for compatibility with reference managers like Zotero and Mendeley. Rate limiting respects robots.txt directives and implements polite crawling with configurable delays between requests.

Best fit

When to reach for it

Best when the job fits Research & Scraping.
Works naturally with MCP setups.

Trust & provenance

Why this listing is credible

Built around the beautifulsoup toolchain.
Trust status: Security Reviewed.
License: MIT License.
Last updated Mar 24, 2026.

View source ↗