4 Skills for Publication-Ready Research
A researcher’s skills library for the full publication pipeline
5 What Are Skills?
A skill is a reusable set of instructions that tells Claude how to perform a specific task. At its core, a skill is just a folder containing a SKILL.md file — a plain-text document written in markdown that describes what Claude should do, step by step. When Claude encounters a relevant task, it reads the skill and follows the instructions. You can also invoke a skill directly by typing /skill-name in the terminal.
Think of skills as saved expertise. Without a skill, you’d type the same detailed instructions every time you wanted Claude to, say, read a paper carefully or audit your code. With a skill, those instructions are written once and reused automatically.
There is no magic here. A skill is just a prompt — the same kind of text you would type into a conversation with Claude. You could paste the contents of any SKILL.md directly into a chat and get the same result. The only difference is that a skill file is saved to disk so you don’t have to retype it, and Claude can load it automatically when a relevant task comes up. If you can write clear instructions for a research assistant, you can write a skill.
Where skills live. Skills are stored in .claude/skills/ inside your project (available to that project only) or in ~/.claude/skills/ on your machine (available across all projects). Each skill gets its own folder. Note that .claude/ is a hidden folder (the dot makes it invisible by default) — use ls -a in the terminal or press Cmd+Shift+. in macOS Finder to see it:
.claude/skills/split-pdf/
├── SKILL.md ← the instructions Claude reads (required)
└── methodology.md ← optional supporting files
What SKILL.md contains. The file starts with a short YAML header (name, description) and then markdown instructions. The description helps Claude decide when to use the skill automatically. Here’s a minimal example:
---
name: check-data
description: Validate a dataset for common problems before analysis
---
When asked to check a dataset:
1. Report dimensions (rows × columns)
2. Flag missing values by variable
3. Check for duplicate observations
4. Summarize variable types and rangesThat’s it — no programming required. If you can write a checklist, you can write a skill.
Additional frontmatter fields. The name and description fields are the essentials. The full specification also supports disable-model-invocation (prevents auto-triggering), user-invocable, allowed-tools, model, context (set to fork for subagent execution), agent, argument-hint, effort, and hooks (skill-scoped hooks), among others. See Anthropic’s skills documentation for the complete reference.
Skills are not limited to coding or academia. Any repeatable task where you’d give Claude the same instructions more than once is a candidate for a skill. A marketing team might have one for drafting campaign briefs. A legal team might have one for extracting clauses from contracts. The skills in this tutorial target empirical research, but the pattern works anywhere you want consistent output without re-explaining the task every time.
For full technical details, see Anthropic’s skills documentation.
6 When Are Skills Worth Building?
Not all skills are equally valuable. Before writing one, ask: would Claude do this well from a one-off prompt? If yes, you don’t need a skill — just prompt it. Skills earn their keep in three situations:
- Guardrail skills prevent mistakes Claude makes inconsistently — like slipping into causal language when interpreting OLS, or reading a 60-page PDF in one pass and hallucinating half the citations. The skill doesn’t teach Claude anything new; it enforces discipline.
- Workflow skills impose a systematic process Claude wouldn’t follow on its own — like classifying every referee comment into 5 categories, or running robustness checks across 7 dimensions instead of listing a few and stopping.
- Convenience skills just save typing — reminding Claude which R packages you prefer or what your YAML header looks like. These are the weakest candidates. If it’s just formatting preferences, put it in CLAUDE.md instead.
If a skill doesn’t fall into category 1 or 2, you probably don’t need it.
When you’re ready to build your first custom skill:
“Help me create a Claude Code skill. I want a skill called [name] that does [what]. I’ve been doing this manually — here’s what the process looks like: [describe steps]. Create the SKILL.md file in .claude/skills/[name]/ with frontmatter and instructions.”
7 MEMORY.md: How Claude Learns From Its Mistakes
Before we look at individual skills, there’s one piece of infrastructure to cover: MEMORY.md. You’ve been using it since Tutorial 2, and several skills in this tutorial depend on it.
7.1 The problem MEMORY.md solves
Claude has no memory between sessions. Every time you start a new conversation, it begins from scratch. If you corrected Claude yesterday — told it to use theme_bw() instead of theme_meridian(), or to never modify raw data files — it has already forgotten. Without intervention, you’ll make the same correction again tomorrow, and the day after that.
CLAUDE.md (Tutorial 2) solves part of this problem by giving Claude stable rules at the start of every session: who you are, what tools to use, how to behave. But CLAUDE.md is for rules you know in advance. Many of the most important lessons only emerge through use — the edge cases, the quirks, the project-specific conventions that you discover when Claude gets something wrong.
MEMORY.md captures those lessons. It lives alongside CLAUDE.md in your .claude/ folder and serves as a running log of corrections, preferences, and accumulated knowledge. If CLAUDE.md is your constitution, MEMORY.md is case law.
7.2 How it works
MEMORY.md is a plain-text file with short, tagged entries:
[LEARN:ggplot] Do not use theme_meridian() — use theme_bw() for all ggplot2 figures in this project.
[LEARN:quarto] Quarto RevealJS, not Beamer. Never generate .tex slide files.
[LEARN:estimation] Do not comment on whether coefficient estimates are "good" or "bad."
Focus on whether the specification is correct.
[LEARN:paths] Use relative paths in all code. Absolute paths break reproducibility across machines.
[LEARN:data] Never modify files in data/raw/. All cleaning outputs go to data/clean/.
[LEARN:referee2] Referee 2 never modifies author code. It only reads, runs,
and creates replication scripts in code/replication/.Each entry uses the [LEARN:category] format — a short tag that makes it easy to scan, search, and group related lessons. The category is freeform: use whatever label makes sense for your project.
MEMORY.md is a convention, not a Claude Code feature. Unlike CLAUDE.md, which Claude Code reads automatically at session start, MEMORY.md has no special status in the system. Claude will only read it if you tell it to — which is why the Session Startup section in your CLAUDE.md (see Tutorial 2) explicitly instructs Claude to read it:
## Session Startup
At the start of every session:
1. Read README.md and this CLAUDE.md before doing anything else
2. Read MEMORY.md if it exists — it contains corrections from prior sessions
3. State the session goal and confirm alignment before writing code
4. Note any problems encountered — append to MEMORY.mdWithout this instruction in CLAUDE.md, MEMORY.md is just a file sitting on disk. The startup protocol is what gives it power.
7.3 How corrections get added
MEMORY.md is not auto-generated. When Claude makes a mistake and you correct it, tell it to record the lesson:
Prompt:
“Add to MEMORY.md: [LEARN:ggplot] Do not use theme_meridian() — use theme_bw()”
Claude appends the entry. Next session, it reads the file at startup and avoids the same mistake. Over time, your MEMORY.md becomes a personalized knowledge base — a record of every quirk, preference, and correction that would otherwise be lost between sessions.
7.4 Why this matters for skills
Many of the skills in this tutorial produce outputs that depend on project conventions — which plotting theme to use, where to save files, how to name variables. Without MEMORY.md, a skill might generate correct code that uses the wrong conventions for your project. With it, Claude applies your accumulated corrections every time, so skills produce output that actually fits your workflow.
Where it comes from. The [LEARN] tag system comes from Pedro Sant’Anna’s CLAUDE.md template. His approach is to embed corrections directly in CLAUDE.md itself. I split it into a separate file for two reasons: (1) CLAUDE.md stays under the ~150 line limit even as corrections accumulate, and (2) MEMORY.md can grow indefinitely without crowding out the stable behavioral rules. You can share your CLAUDE.md as a template; your MEMORY.md is personal to your project’s history.
8 What This Tutorial Covers
This tutorial walks through a working skills library for the full publication pipeline — from reading papers through submission. The skills borrow from Scott Cunningham’s MixtapeTools, Hugo Sant’Anna’s Clo-Author, and my own experience with the academic workflow. I’ve adapted them for political science and applied economics.
The skills_library/ folder accompanying this tutorial has all of it: skills, project configuration, and institutional memory:
skills_library/
├── README.md ← what's in the folder and how to use it
├── .claude/
│ ├── CLAUDE.md ← project configuration (Tutorial 2)
│ ├── MEMORY.md ← lessons learned across sessions
│ └── skills/
│ ├── split-pdf/ ← Structured paper reading (guardrail)
│ ├── search-book/ ← Targeted search in large PDFs (convenience)
│ ├── editorial-review/ ← Multi-dimensional manuscript critique (workflow)
│ ├── revise-and-resubmit/ ← R&R management (workflow)
│ ├── presentation-builder/← Academic slide deck generation (workflow)
│ ├── referee2/ ← Adversarial project audit (workflow)
│ ├── robustness-battery/ ← Systematic specification testing (workflow)
│ └── regression-interpret/← Regression output interpretation (guardrail)
The .claude/ folder has the project configuration files from Tutorial 2: the CLAUDE.md that tells Claude who you are and how to behave, plus the MEMORY.md that captures corrections across sessions (explained above). Skills live in .claude/skills/ — this is where Claude Code looks for them. These configuration files and skills together are the foundation that makes Claude Code useful for research.
I’ll walk through four skills in detail and then briefly describe the rest.
9 Skill 1: split-pdf (Guardrail)
9.1 The problem
When you give Claude a full PDF, it tries to process everything at once. For short documents this works fine. For a 40-page economics paper with tables, equations, and appendices, it produces shallow summaries with hallucinated details — the kind where the paper is real, the method is roughly right, but the specific coefficient from Table 3 Column 4 is fabricated.
Literature reviews in good journals require you to actually engage with source material: correct attribution of findings, precise methodological descriptions, honest representation of what prior work does and doesn’t show. If you let Claude read papers in one pass and then write your lit review, you’ll get hallucinated citations, methodological mischaracterization, and overconfident claims about what sources actually found.
9.2 What the skill enforces
Three hard constraints that Claude would not follow on its own:
- Never read a full PDF. Split into 4-page chunks first.
- Read only 3 chunks at a time (~12 pages). After each batch, update running notes and pause.
- Wait for user confirmation before reading the next batch.
The pause-and-confirm protocol is what makes this work. It forces Claude to commit to an interpretation of pages 1–12 before seeing pages 13–24, which means it can’t retroactively revise early claims to sound more coherent with later material. This is closer to how you’d actually read a paper yourself.
Structured extraction. As Claude reads, it collects information along 8 dimensions:
| # | Dimension | What to Look For |
|---|---|---|
| 1 | Research question | What and why |
| 2 | Audience | Which sub-community |
| 3 | Method | Identification strategy |
| 4 | Data | Source, unit, sample, period |
| 5 | Statistical methods | Estimator, key specifications |
| 6 | Findings | Coefficients, standard errors |
| 7 | Contributions | What’s new |
| 8 | Replication feasibility | Public data? Code archive? |
These extract what you need to build on or replicate the work, not just summarize it.
9.3 How to use it
Prompt:
“/split-pdf articles/acemoglu_2001.pdf”
Claude splits the PDF, reads the first ~12 pages, presents structured notes, and waits for your confirmation before continuing. The final output is a notes.md file in the split directory with specific data sources, variable names, and coefficient estimates — not vague summaries.
After reading all papers for a literature review, synthesize:
Prompt:
“I’ve now read 5 papers on colonial institutions and development. Synthesize the structured notes into a literature review identifying: points of agreement, methodological debates, and gaps my study addresses. Do NOT invent any citations.”
Always audit citations. Even with structured reading, verify that every paper exists, the author actually said what’s claimed, the findings are accurately reported, and table/column/figure references are correct. AI confidently produces citations where the paper is real but the specific numbers are fabricated. These are the hardest hallucinations to spot because they look plausible.
10 Skill 2: editorial-review (Workflow)
10.1 The problem
If you ask Claude to “review my paper,” you get a generic list of suggestions. Some are useful, some are hallucinated, and there’s no way to tell which is which.
10.2 What the skill does
The editorial-review skill gives you structure: 6 review modes, confidence scores on every finding, and an author-defense filter that stress-tests each criticism before including it in the report.
| Flag | What It Does |
|---|---|
--proofread |
Writing quality across 6 categories (structure, claims-evidence alignment, clarity, grammar, etc.) |
--consistency |
Internal consistency — arithmetic checks, abstract-to-results traceability, figure-text reconciliation, conceptual term drift, citation accuracy |
--methods |
Identification strategy audit with assumption stress-testing |
--scope |
Scope conditions, external validity, theory-empirics mismatch |
--peer JOURNAL_NAME |
Simulated blind review grounded in reference articles and journal guidelines |
--all |
Everything, followed by the author-defense filter |
The --consistency mode is worth explaining in more detail. It checks whether numbers in your text are arithmetically consistent with your tables (e.g., if you claim a 15% effect relative to a mean of 0.30, the implied treated value must be 0.345 — does that make sense?). It traces every quantitative claim in the abstract to a specific table or figure. It tracks whether your key conceptual terms shift meaning between sections. These are the errors that survive multiple rounds of co-author review and then get caught by Referee 2.
The --methods mode evaluates identification across 5 weighted dimensions: identification (35%), estimation (25%), inference (20%), robustness (15%), and replication (5%). It stress-tests every identifying assumption — asking whether it’s stated, testable, tested, and whether the paper’s defense is logically sufficient or merely restates the assumption. It has domain expertise in DiD (Callaway-Sant’Anna, Sun-Abraham, de Chaisemartin-D’Haultfoeuille), IV, RDD, Synthetic Control, and Event Studies.
The --peer mode simulates a blind review for a target journal. Rather than relying on Claude’s general impressions of a journal, you feed it concrete material: 2–3 recent articles from the journal (read with /split-pdf first) and the journal’s submission guidelines. This grounds the review in what the journal actually publishes rather than what Claude thinks it publishes. It produces two independent referee reports (one on substance, one on methods) plus an editorial synthesis.
The author-defense filter. This is the part that matters most. After generating all criticisms, the skill switches perspective to the paper’s author and stress-tests each one. For each criticism, it asks: what is the strongest possible defense? Does it hold? Any criticism that turns out to be hallucinated, factually wrong, or below 30% confidence after defense gets dropped from the final report. So the report you actually read contains only criticisms that survived adversarial scrutiny, which cuts out a lot of noise.
10.3 How to use it
Prompt:
“/editorial-review Paper/main.tex –all”
The report is saved to quality_reports/full_review_[date].md. Each finding includes a confidence score and the result of the author-defense test. You can also run individual modes when you only need one thing — e.g., /editorial-review Paper/main.tex --methods for just the identification audit.
11 Skill 3: revise-and-resubmit (Workflow)
11.1 The problem
An R&R decision arrives with two referee reports totaling 8 pages of comments. Some require new analysis, some are clarifications, some are unreasonable, and some are typos. Without structure, revisions get chaotic. You lose track of what’s been addressed, you forget to respond to comment 2.7, and you spend three paragraphs diplomatically addressing a point the referee was simply wrong about.
11.2 What the skill does
It classifies every referee comment into one of five categories:
| Class | Meaning | Action |
|---|---|---|
| NEW ANALYSIS | Referee wants a test/table/figure that doesn’t exist | Flag for your approval, create analysis task |
| CLARIFICATION | Referee misunderstood or wants more explanation | Draft rewritten section |
| REWRITE | Referee wants structural changes | Draft restructured section |
| DISAGREE | The referee is wrong or the request is unreasonable | Draft diplomatic pushback, flag for your review |
| MINOR | Typo, formatting, citation | Draft the fix directly |
The classification drives three outputs:
- Revision tracker (
quality_reports/revision_tracker_round1.md) — a checklist of every comment, classified and prioritized, with checkboxes - Response letter (
quality_reports/referee_response_round1.tex) — a LaTeX document quoting each referee comment in italics with your response in a distinct color, referencing specific page/section/table numbers - Revised section drafts (in
quality_reports/revised_sections/) — for CLARIFICATION and REWRITE items only, saved separately so you can review before incorporating
The diplomatic disagreement protocol. For DISAGREE items, the skill drafts a response that acknowledges the referee’s concern, presents evidence for your position, and offers a partial concession (like an additional robustness check). It never says “the referee is wrong.” Every DISAGREE item is flagged for your review. The skill drafts a response, but you decide whether to use it, concede, or rewrite.
11.3 How to use it
Prompt:
“/revise-and-resubmit correspondence/referee_reports.pdf Paper/main.tex”
Claude reads the reports, classifies every comment, and produces all three outputs. You review the tracker, approve or modify the DISAGREE responses, and use the response letter as your starting point.
12 Skill 4: presentation-builder (Workflow)
12.1 The problem
When you ask Claude to turn a paper into slides, you get a bulleted summary of each section — one slide per section, bullet points under each, generic “Results” and “Conclusion” titles. The deck reads like a document reformatted as slides, not something you’d actually want to sit through.
12.2 What the skill does
The presentation-builder skill enforces design constraints that Claude would otherwise ignore. It builds on Scott Cunningham’s Rhetoric of Decks framework, adapted from Beamer to Quarto RevealJS. The rules:
- Assertion titles — “Colonial institutions explain 75% of income variation,” not “Results”
- One idea per slide — cognitive load is distributed evenly, not front-loaded
- Figures carry the argument — every chart answers one question, with direct labels (no legends requiring eye movement)
- Minimal bullet points — lists are converted to diagrams, tables, or prose (maximum 3 items if truly needed)
After building the deck, the skill runs a multi-agent review: one agent checks argument quality and narrative flow, another checks visual elements (figure sizing, label visibility, numerical accuracy). It also generates a proof PDF via decktape and reads it page by page to catch silent visual errors.
The full presentation philosophy (the Three Laws of slide design, the rhetoric framework, the MB/MC equivalence principle) is in the presentation-builder/methodology.md file. Read it if you want to understand the design theory or customize the skill.
12.3 How to use it
Prompt:
“/presentation-builder paper.tex”
Claude reads the paper, plans assertion titles, generates figures (R with theme_bw()), builds the Quarto RevealJS deck, compiles, runs the multi-agent review, and delivers a polished .html deck. You can customize for context:
Conference talk (20 min):
“/presentation-builder paper.tex — conference presentation for political scientists in comparative politics, 20 minutes, emphasize identification strategy and key results”
Job talk (45 min):
“/presentation-builder paper.tex — job talk, include roadmap slide, more time on data and robustness”
13 The Other Skills
The skills_library/skills/ folder contains four additional skills not covered in detail above. Each is a single SKILL.md file you can read, modify, and install:
- search-book — targeted search in large PDFs (books, dissertations, reports downloaded from Internet Archive). Extracts full text, builds a reusable page index, searches by keyword and context, and returns page numbers with surrounding passages. You pick which pages to read in detail — it doesn’t auto-read hundreds of pages. Handles scanned PDFs by flagging when OCR is needed.
- referee2 — adversarial 5-audit project review run in a fresh terminal to avoid context contamination. Checks code quality, cross-language replication (optional), directory structure and reproducibility, output automation, and econometrics. Produces a formal report with a preliminary assessment (Accept / Minor / Major / Reject).
- robustness-battery — takes your main regression specification and systematically generates alternative specifications across 7 dimensions (samples, controls, fixed effects, standard errors, estimators, functional forms, placebos), producing organized scripts, a summary table, and a specification curve dataset. Classifies results as robust, sensitive, or fragile.
- regression-interpret — a guardrail skill that prevents common interpretation errors: slipping into causal language for OLS, misreading log specifications, overstating significance, or ignoring that a large point estimate with a wide confidence interval is not a strong result. Produces a draft results paragraph with proper hedging.
14 Putting It Together: The Pre-Submission Checklist
Before submitting a paper, run these checks in sequence:
| Step | Skill | What to Ask | Time |
|---|---|---|---|
| 1 | referee2 (fresh terminal) | “/referee2 . R” | 20–30 min |
| 2 | editorial-review | “/editorial-review Paper/main.tex –all” | 15–20 min |
| 3 | robustness-battery | “/robustness-battery code/main_analysis.R R” | 10–15 min |
| 4 | presentation-builder | “/presentation-builder paper.tex” | 15–25 min |
Step 1 catches bugs: coding errors, reproducibility failures, hardcoded numbers. Step 2 catches paper-level weaknesses: arithmetic inconsistencies, identification gaps, scope problems, and it simulates a journal-specific referee. Step 3 catches fragile results, meaning specifications where the headline finding doesn’t survive reasonable alternatives. Step 4 produces the conference deck, turning the paper into a talk with proper rhetoric, figures, and multi-agent visual review.
After running the first three, you should have reports in quality_reports/ covering code, paper, and robustness. Address every finding before submitting. If referee2’s assessment is “Major Revisions,” fix the issues and re-run in a fresh terminal.
When the R&R comes back, use /revise-and-resubmit to classify the comments and draft your response letter.
15 The Skills Library Folder
The skills_library/ folder that comes with this tutorial is meant to be shared. It has everything you need to install the skills library and project configuration into your own Claude Code project. The README.md at the folder root explains what each file does and how to install.
To install:
- Copy the
.claude/folder into your project root (it contains skills, CLAUDE.md, and MEMORY.md) - Edit
.claude/CLAUDE.mdto match your profile - Edit or clear
.claude/MEMORY.md(start fresh or keep the examples) - Use Claude Code as usual — skills trigger automatically when relevant, or invoke them manually with
/skill-name. To prevent a skill from auto-triggering, adddisable-model-invocation: trueto its YAML frontmatter.
16 Further Reading
The skills in this tutorial draw on two open-source projects and Anthropic’s documentation:
- MixtapeTools by Scott Cunningham — the original Referee 2 protocol, split-pdf skill, presentation philosophy, and project scaffolding tools
- Clo-Author by Hugo Sant’Anna — a full research pipeline with 10 slash commands, worker-critic agent pairs, and quality gates. Documentation at psantanna.com/claude-code-my-workflow
- Anthropic skills documentation — the technical reference for building and installing Claude Code skills
- Panjwani’s AI MBA — practical training on skills, MCP, and Claude Code workflows