paper-fetch
githubDOI → PDF resolver with 7-source fallback (Unpaywall, S2, arXiv, PMC, bioRxiv, publisher, Sci-Hub). Multi-agent, zero-deps Python.
paper-fetch — Download scientific paper PDFs by DOI
English · 中文 · 📖 Online Docs
Resolve a DOI (or title) to a PDF via a 7-source fallback chain — Unpaywall → Semantic Scholar → arXiv → PubMed Central → bioRxiv/medRxiv → publisher direct → Sci-Hub mirrors. Pure Python stdlib, agent-native CLI with stable JSON envelopes.
What it does
Resolve a DOI (or title) to a PDF
- 7-source fallback chain: Unpaywall → Semantic Scholar → arXiv → PubMed Central → bioRxiv/medRxiv → publisher direct (institutional opt-in) → Sci-Hub mirrors (last resort, on by default)
- Title-only input via
--title— Crossref + Semantic Scholar resolution with confidence flags - Auto-named output:
{first_author}_{year}_{journal_abbrev}_{short_title}.pdf
Batch + agent-friendly
--batch dois.txtor--batch -(stdin) for bulk download--idempotency-keyreplays the exact envelope on retry without network I/O--streamemits one NDJSON result per line as each DOI resolves- Skips already-downloaded files unless
--overwrite
Built-in correctness
- Stable JSON envelope on stdout, NDJSON progress on stderr, machine-readable
schemasubcommand - TTY-aware format default, typed exit codes (
0/1/3/4) for orchestrator routing - SSRF defense +
%PDFmagic-byte check + 50 MB size cap on every fetch - Zero runtime dependencies — pure Python stdlib
Works with Claude Code, Codex, Hermes, OpenClaw, ClawHub, pi-mono, and SkillsMP — any agent that supports the Agent Skills format.
Discipline coverage
The skill is discipline-agnostic — it works for any field, not just life sciences or CS.
| Source | Discipline scope |
|---|---|
| Unpaywall | ✅ All disciplines (every Crossref DOI — humanities, social sciences, physics, chemistry, economics) |
| Semantic Scholar | ✅ All disciplines (cross-domain academic graph) |
| arXiv | Physics, math, CS, statistics, quant finance, economics, EE |
| PubMed Central | Biomedical only |
| bioRxiv / medRxiv | Biology / medicine preprints only |
| Sci-Hub | ✅ All disciplines (last resort) |
In practice, Unpaywall + Semantic Scholar alone cover OA papers in chemistry, materials, economics, psychology, humanities, and every other field via institutional repositories, SSRN, RePEc, and publisher-hosted OA copies.
Comparison
vs. native agent (no skill)
| Feature | Native agent | This skill |
|---|---|---|
| Resolve DOI to PDF | Ad-hoc web search | Deterministic 7-source chain |
| Title → DOI resolution | Manual | --title (Crossref + S2 fallback, confidence flags) |
| Batch download | ❌ | ✅ --batch dois.txt or --batch - |
| Consistent filenames | ❌ | ✅ author_year_journal_title.pdf |
| Machine-readable schema | ❌ | ✅ fetch.py schema |
| Structured output | ❌ | ✅ JSON envelope + NDJSON progress |
| Idempotent retries | ❌ | ✅ --idempotency-key |
| Typed exit codes | ❌ | ✅ 0/1/3/4 |
SSRF + %PDF + size cap | ❌ | ✅ enforced |
Prerequisites
-
python3(3.8+, stdlib only — nopip installneeded) -
(Recommended) An Unpaywall contact email:
export UNPAYWALL_EMAIL=you@example.com
Without it, Unpaywall is skipped and the remaining 6 sources still work.
Installation
# Any agent (Claude Code, Cursor, Copilot, etc.)
npx skills add Agents365-ai/365-skills -g
# Claude Code only
> /plugin marketplace add Agents365-ai/365-skills
> /plugin install paper-fetch
Also published on SkillsMP and ClawHub — each handles updates through its own marketplace.
Usage
Just describe what you want:
> Download the AlphaFold2 paper PDF to ~/papers
> Fetch DOI 10.1038/s41586-020-2649-2
> Batch-download every DOI from dois.txt
> Find a PDF for "Attention Is All You Need" and save it
> Preview the resolved PDF URL for 10.1126/science.abj8754 without downloading
Or call the script directly:
# Single DOI
python skills/paper-fetch/scripts/fetch.py 10.1038/s41586-021-03819-2
# By title (resolved to DOI via Crossref + S2 fallback)
python skills/paper-fetch/scripts/fetch.py --title "Highly accurate protein structure prediction with AlphaFold"
# Dry-run preview (no download)
python skills/paper-fetch/scripts/fetch.py 10.1038/s41586-020-2649-2 --dry-run
# Batch with idempotency
python skills/paper-fetch/scripts/fetch.py --batch dois.txt --out ~/papers \
--idempotency-key monday-review-batch
# Pipe DOIs from another tool
echo 10.1038/s41586-021-03819-2 | python skills/paper-fetch/scripts/fetch.py --batch -
# Agent discovery
python skills/paper-fetch/scripts/fetch.py schema --pretty
Full flag reference and JSON envelope schema in skills/paper-fetch/SKILL.md.
Institutional access (opt-in)
If your institution has a subscription, set PAPER_FETCH_INSTITUTIONAL=1 to enable the publisher-direct fallback. Your IP / cookies / EZproxy authorize the fetch; the skill adds a 1 req/s rate limiter to keep batch jobs within publisher ToS.
export PAPER_FETCH_INSTITUTIONAL=1
See plan/institutional-access.md for design details.
Known limitations
- Some publisher redirects return an HTML landing page; the
%PDFheader check rejects them - No browser automation — no CAPTCHA solving, no Playwright, no stealth
- SSRF defense rejects private IPs, non-http(s) schemes, non-80/443 ports, cloud metadata hosts
- 50 MB cap per PDF download
🔗 Related Skills
Part of the Agents365-ai research-skill family — pick the right tool for the job:
| Skill | Niche | When to use |
|---|---|---|
| semanticscholar-skill | Semantic Scholar API search | When you need to FIND papers before fetching |
| asta-skill | Same corpus via Ai2 Asta MCP | When your host supports MCP and you have an Asta API key |
| scholar-deep-research | 8-phase literature review pipeline | When you want a structured cited report, not just PDFs |
| zotero-research-assistant | Zotero library workflows | When references go into Zotero |
💬 Community
- Discord: https://discord.gg/79JF5Atuk
- WeChat: scan the QR code below
❤️ Support
If this skill helps you, consider supporting the author:
WeChat Pay |
Alipay |
Buy Me a Coffee |
Give a Reward |
👤 Author
Agents365-ai
- GitHub: https://github.com/Agents365-ai
- Bilibili: https://space.bilibili.com/441831884