2  Writing Your CLAUDE.md

A section-by-section guide to configuring Claude Code for research

3 What Is CLAUDE.md and Why Does It Matter?

Claude Code forgets everything between sessions. Every time you open a new terminal and type claude, it starts fresh: no memory of your project, your methods, your file locations, or the decisions you made yesterday. Scott Cunningham calls this the “amnesia problem”:

“Claude has amnesia — it forgets everything between sessions.”

— Cunningham, MixtapeTools workflow.md

The fix is a file called CLAUDE.md that Claude Code reads automatically at the start of every session. It tells Claude who you are, how to behave, where your files live, and what rules to follow. If you skip it, you spend the first five minutes of every session re-explaining context that hasn’t changed.

This tutorial walks through a complete, working CLAUDE.md line by line. The file draws on three sources:

  • Scott Cunningham (MixtapeTools) — research-specific sections like estimation philosophy, variable definitions, and the Referee 2 protocol. (Template on GitHub)
  • Pedro Sant’Anna (claude-code-my-workflow) — the “project constitution” approach: core principles, quality gates, [LEARN] tags, and the ~150 line limit. (Template on GitHub)
  • Chris Blattman (claudeblattman.com) — the general-purpose template: who you are, your tools, skill levels, and preferences. (Guide)

What follows is a CLAUDE.md built for an empirical social science researcher who works primarily in R with Quarto, stores projects in Dropbox, and cares about empirical rigor. Each section is explained: what it does, where the idea came from, and why it belongs in a file you want to keep under ~150 lines.

4 Where to Store It

Claude Code reads CLAUDE.md from three levels:

~/.claude/CLAUDE.md             ← user-level (applies to ALL projects)
project/CLAUDE.md               ← project root (this project only)
project/.claude/CLAUDE.md       ← also project-level (this project only)

The user-level file (~/.claude/CLAUDE.md) is for identity and preferences that apply everywhere: your name, communication style, tool preferences. The project-level file holds project-specific rules like estimation philosophy, folder conventions, and quality gates. If both exist, Claude reads both.

For a single research project, either project-level location works. Cunningham puts it at the project root. Blattman uses the .claude/ subfolder. Pick one and be consistent.

Seeing hidden folders

The .claude/ folder starts with a dot, which makes it hidden by default on macOS and Linux. You won’t see it in Finder or with a plain ls command. To see it:

  • Terminal: use ls -a (the -a flag shows hidden files)
  • macOS Finder: press Cmd + Shift + . to toggle hidden files on and off
  • VS Code: hidden files are shown by default in the Explorer sidebar

If you create a claude/ folder (without the dot), Claude Code will not find your configuration files. The dot matters.

Why hidden? On Unix systems (macOS, Linux), any file or folder starting with . is treated as hidden. This is a convention for configuration and tool metadata — things that are important for the software to function but that you don’t need to see during normal work. You already have many hidden folders in your home directory: .git/ (git internals), .ssh/ (SSH keys), .zshrc (shell config), .Rprofile (R startup). The .claude/ folder follows the same logic: it holds Claude Code’s configuration, not your research files.

Should you keep it hidden? Yes. Resist the temptation to rename it to a visible folder. Hidden folders stay out of your way in Finder and ls, they won’t accidentally get included when you zip a folder to share with a co-author, and tools like .gitignore handle them predictably. More importantly, Claude Code expects the dot — it will not detect configuration or skills in a claude/ folder without it. Learn the Cmd+Shift+. shortcut for Finder and ls -a for the terminal, and you’ll have no trouble accessing it when you need to.

5 The Complete File

Here is the full CLAUDE.md. The rest of this tutorial explains each section.

# CLAUDE.md — [YOUR NAME]

**Institution:** [YOUR INSTITUTION]
**Branch:** main

---

## Core Principles

- **Plan first** — enter plan mode before non-trivial tasks
- **Verify after** — run code, render output, confirm results before moving on
- **Design before results** — focus on specification correctness, not point estimates
- **Cross-replicate** — implement key analyses in R (primary) and Python (replication)
- **[LEARN] tags** — when corrected, note `[LEARN:category] wrong → right` in this file

---

## Estimation Philosophy

**Design before results.** During estimation and analysis:

- Do NOT express concern or excitement about point estimates
- Do NOT interpret results as "good" or "bad" until the design is intentional
- Focus entirely on whether the specification is correct
- Results are meaningless until we're confident the design is right
- Objectivity means being attached to getting the design right, not to any particular finding

---

## Communication Guidelines

- Refer to me as [YOUR FIRST NAME] in informal contexts, "the author" in formal outputs
- Be concise (whenever possible) and clear — I prefer concise and clear content over verbose explanations
- Do not hedge or pad responses with unnecessary caveats
- When suggesting code, include comments explaining non-obvious choices

---

## Tools and Software

| Tool | Use | Notes |
|------|-----|-------|
| R | Primary analysis, visualization (ggplot2, fixest, sf, dplyr, tidyr) | Always use |
| Python | Cross-language replication (statsmodels, linearmodels, pandas) | For verification |
| Quarto | Slide decks (RevealJS), documents, website | NOT Beamer |
| LaTeX | Paper drafts when needed | |
| Git + GitHub | Version control | Commit frequently with meaningful messages |

---

## Folder Structure and File Conventions

project/
├── code/R/          — R scripts (01_clean.R, 02_analysis.R, ...)
├── code/python/     — Replication scripts
├── data/raw/        — Original data (NEVER modify)
├── data/clean/      — Cleaned datasets
├── output/tables/   — Generated tables
├── output/figures/  — Generated figures

- Projects live in ~/Library/CloudStorage/Dropbox/ (e.g., Dropbox/my-research-project/)
- Use relative paths in all code — absolute paths break reproducibility
- Quarto theme: use _extensions/meridian.scss when available
- ggplot2: use theme_bw() as default (update if your project defines a custom theme)

---

## Quarto Conventions

- Slide decks use RevealJS format with embed-resources: true
- Section headers: # Title {background-color="#1e293b"}
- Exercises: {background-color="#b44527"}
- Use :::fragment for progressive disclosure
- Mermaid diagrams: always include %%{init:{...}}%% block with useMaxWidth: true
- Code chunks: echo: false, warning: false, message: false by default
- Maximum 6 bullets per slide, 10 words per bullet

---

## Project Overview and Status

See README.md for the current research question, method, data, and status.

---

## Quality Thresholds

| Score | Gate | Meaning |
|-------|------|---------|
| 80 | Commit | Good enough to save |
| 90 | Review | Ready for coauthor/self-review |
| 95 | Submit | Ready for journal or classroom |

---

## Referee 2 Correspondence

Status: [Not yet audited / In progress / Accepted]
Reports at: correspondence/referee2/

Critical Rule: Referee 2 NEVER modifies author code. It only reads, runs,
and creates replication scripts in code/replication/.

---

## Session Startup

At the start of every session:
1. Read README.md and this CLAUDE.md before doing anything else
2. Read MEMORY.md if it exists — it contains corrections from prior sessions
3. State the session goal and confirm alignment before writing code
4. Note any problems encountered — append to MEMORY.md

The file is 113 lines, well under Sant’Anna’s ~150 line recommendation. That limit is not arbitrary: Claude reads the entire file, but following 50 rules simultaneously is harder than following 15. Each additional rule constrains the output space and increases the chance of silent conflicts — where satisfying one rule means subtly violating another. A focused 120-line file where every rule gets followed beats a 400-line file where Claude quietly drops the ones that conflict with others. The sections below explain why each part is there.

6 Section-by-Section Explanation

6.1 Core Principles

## Core Principles

- **Plan first** — enter plan mode before non-trivial tasks
- **Verify after** — run code, render output, confirm results before moving on
- **Design before results** — focus on specification correctness, not point estimates
- **Cross-replicate** — implement key analyses in R (primary) and Python (replication)
- **[LEARN] tags** — when corrected, note `[LEARN:category] wrong → right` in this file

Where it comes from: Pedro Sant’Anna’s “project constitution” approach. His CLAUDE.md opens with core principles that govern Claude’s behavior across every session. The idea is that these are non-negotiable rules, not suggestions.

Why it’s here: These five rules address the failure modes I’ve seen most often in AI-assisted research:

  1. Plan first stops Claude from jumping straight into code that goes in the wrong direction. For anything beyond a quick fix, Claude should outline its approach before executing. This is Sant’Anna’s entry gate.

  2. Verify after means you don’t accept output on faith. After running a regression, rendering a Quarto deck, or generating a figure, Claude should confirm the output is correct before moving on.

  3. Design before results is Cunningham’s signature principle (see next section). It appears here as a one-line behavioral rule.

  4. Cross-replicate makes cross-language verification a default rather than something you remember to do at the end. Cunningham’s insight: “The mistakes Claude makes in R and Stata are largely independent. They’re orthogonal error vectors.” Running the same analysis in R and Python catches bugs that single-language testing misses.

  5. [LEARN] tags come from Sant’Anna. When Claude gets corrected, it notes the correction directly in CLAUDE.md so the same mistake doesn’t recur. For example: [LEARN:R] fixest::etable() with booktabs=TRUE requires \usepackage{booktabs} in preamble.

6.2 Estimation Philosophy

## Estimation Philosophy

**Design before results.** During estimation and analysis:

- Do NOT express concern or excitement about point estimates
- Do NOT interpret results as "good" or "bad" until the design is intentional
- Focus entirely on whether the specification is correct
- Results are meaningless until we're confident the design is right
- Objectivity means being attached to getting the design right, not to any particular finding

Where it comes from: Scott Cunningham’s MixtapeTools template. This section is copied verbatim.

Why it’s here: I think this is the most important section in any research CLAUDE.md. If you leave it out, Claude will cheerfully tell you “Great news, the coefficient is significant!”, express concern about null results, and nudge you toward findings that look “interesting.”

With these lines in place, Claude focuses on whether the specification is correct: the right fixed effects, sample restrictions, standard errors. It treats results as outputs of a design, not as goals.

Why it’s separate from Core Principles: The estimation philosophy is detailed enough to need its own section. Core Principles gives the one-line version (“design before results”); this section spells out what that means during actual estimation.

6.3 Communication Guidelines

## Communication Guidelines

- Refer to me as [YOUR FIRST NAME] in informal contexts, "the author" in formal outputs
- Be concise (whenever possible) and clear — I prefer concise and clear content over verbose explanations
- Do not hedge or pad responses with unnecessary caveats
- When suggesting code, include comments explaining non-obvious choices

Where it comes from: Blattman’s template includes a “How I Prefer to Work” section. Cunningham’s template includes “Communication Guidelines” with collaborator names.

Why it’s here: Claude’s default communication style is verbose and hedging. It adds caveats like “It’s worth noting that…” and “However, it should be mentioned that…” which waste time for an experienced researcher. These four lines fix that. The name convention (your first name vs “the author”) matters for documents that might be shared.

6.4 Tools and Software

## Tools and Software

| Tool | Use | Notes |
|------|-----|-------|
| R | Primary analysis, visualization (ggplot2, fixest, sf, dplyr, tidyr) | Always use |
| Python | Cross-language replication (statsmodels, linearmodels, pandas) | For verification |
| Quarto | Slide decks (RevealJS), documents, website | NOT Beamer |
| LaTeX | Paper drafts when needed | |
| Git + GitHub | Version control | Commit frequently with meaningful messages |

Where it comes from: Blattman’s template uses a Tools and Software table with columns for Tool, Use, and File Location.

Why it’s here: Claude needs to know which tools to use. If you don’t tell it, it might suggest Stata code because you’re in economics, generate Beamer slides instead of Quarto, or default to matplotlib when you use ggplot2. The “Notes” column does a lot of work here:

  • “Always use” for R means Claude should never default to another language for primary analysis
  • “NOT Beamer” for Quarto prevents Claude from generating LaTeX slide code when you want Quarto RevealJS
  • “For verification” for Python makes clear that Python is a replication tool, not the primary language

Listing specific packages (fixest, sf, dplyr) tells Claude which ecosystem you work in and stops it from suggesting alternatives you don’t use.

6.5 Folder Structure and File Conventions

## Folder Structure and File Conventions

project/
├── code/R/          — R scripts (01_clean.R, 02_analysis.R, ...)
├── code/python/     — Replication scripts
├── data/raw/        — Original data (NEVER modify)
├── data/clean/      — Cleaned datasets
├── output/tables/   — Generated tables
├── output/figures/  — Generated figures

- Projects live in ~/Library/CloudStorage/Dropbox/ (e.g., Dropbox/my-research-project/)
- Use relative paths in all code — absolute paths break reproducibility
- Quarto theme: use _extensions/meridian.scss when available
- ggplot2: use theme_bw() as default (update if your project defines a custom theme)

Where it comes from: Cunningham’s MixtapeTools includes a /newproject command that generates a standard project scaffold. Sant’Anna’s template includes a full directory tree.

Why it’s here: This keeps Claude from creating files in the wrong places. I’ve seen it save figures to the project root instead of output/figures/, or drop an R script at the top level instead of code/R/.

The rules worth highlighting:

  • data/raw/ — NEVER modify. Raw data doesn’t get touched. All transformations happen in code, producing cleaned versions in data/clean/.
  • Relative paths only prevents the reproducibility failure where code works on your machine but breaks everywhere else because it has /Users/yourname/... hardcoded in it.
  • Numbering convention (01_clean.R, 02_analysis.R) makes execution order obvious.

6.6 Quarto Conventions

## Quarto Conventions

- Slide decks use RevealJS format with embed-resources: true
- Section headers: # Title {background-color="#1e293b"}
- Exercises: {background-color="#b44527"}
- Use :::fragment for progressive disclosure
- Mermaid diagrams: always include %%{init:{...}}%% block with useMaxWidth: true
- Code chunks: echo: false, warning: false, message: false by default
- Maximum 6 bullets per slide, 10 words per bullet

Where it comes from: This section is specific to your workflow. Sant’Anna’s template includes Beamer Custom Environments and Quarto CSS Classes sections, which serve the same purpose for different tools.

Why it’s here: You produce all slide decks in Quarto RevealJS. If Claude doesn’t know your conventions, it will generate slides that don’t match your theme, use wrong background colors for section headers, forget to suppress code output, or produce slides that overflow. The embed-resources: true rule ensures the HTML file bundles all assets (images, fonts) so you can share it as a single file.

The Mermaid diagram rule (%%{init:{...}}%% block) prevents a rendering problem where diagrams come out too small or overflow the slide.

6.7 Project Overview and Status

## Project Overview and Status

See README.md for the current research question, method, data, and status.

Where it comes from: Cunningham’s template has detailed Project Overview, Research Question, Data Sources, Identification Strategy, Current Status, and Key Decisions sections, all inside CLAUDE.md.

Why it’s a pointer, not prose: Research questions evolve. Data sources change. Status updates every session. If you put detailed project descriptions in CLAUDE.md, they go stale fast and bloat the file. The ~150 line budget means you can’t afford to waste space on content that changes weekly.

Instead, this one line tells Claude to read README.md, which it will do when you run the session startup routine:

Read all the markdowns. Read the key programs.
State the goals of this session.

The README.md can be as detailed and frequently updated as you need without affecting CLAUDE.md.

What goes in README.md (not CLAUDE.md):

  • Current research question and method
  • Data sources and time periods
  • Key decisions made (with dates and rationale)
  • Dropped analyses (with reasons)
  • Variable definitions
  • Sample restrictions
  • Current status and next steps

All of these change. CLAUDE.md holds behavioral rules that stay stable across sessions.

A concrete example. Here is what a README.md might look like for a difference-in-differences project studying the effect of a land reform on local public goods provision:

# Land Reform and Public Goods — README

## Research Question
Did the 1996 Colombian land reform increase local public goods provision
(schools, clinics, roads) in treated municipalities?

## Method
Difference-in-differences with staggered treatment adoption.
Primary estimator: Callaway & Sant'Anna (2021) via the `did` R package.
Robustness: Sun & Abraham, Borusyak et al.

## Data
- **Treatment**: Municipality-level reform implementation dates (MinAgricultura).
- **Outcomes**: School and clinic density from DANE census panels, 1990–2010.
- **Controls**: Pre-reform population, elevation, distance to department capital.
- Unit of observation: municipality-year. N ≈ 1,100 municipalities × 21 years.

## Key Decisions
- 2025-03-12: Dropped municipalities with population < 500 (measurement noise).
- 2025-03-18: Switched from TWFE to Callaway–Sant'Anna after heterogeneity diagnostic.

## Current Status
- Cleaning: done (01_clean.R)
- Main estimates: in progress (02_analysis.R)
- Robustness checks: not started
- Draft: outline stage

README holds what changes; CLAUDE.md holds what doesn’t. Claude reads both at session start, but only CLAUDE.md needs to stay short.

6.8 Quality Thresholds

## Quality Thresholds

| Score | Gate | Meaning |
|-------|------|---------|
| 80 | Commit | Good enough to save |
| 90 | Review | Ready for coauthor/self-review |
| 95 | Submit | Ready for journal or classroom |

Where it comes from: Pedro Sant’Anna’s quality gate system. His template uses 80 (Commit), 90 (PR-ready), 95 (Excellence). Hugo Sant’Anna’s Clo-Author uses the same thresholds with weighted aggregate scoring.

Why it’s here: Quality gates give you a shared vocabulary for “is this done?” Instead of “looks good” (which is vague), Claude can self-assess: “This code runs but has no error handling and uses hardcoded paths. I’d score it around 75, below the commit threshold.” The scoring works like this:

  • Critical issues: -100 to -15 points
  • Major issues: -5 to -3 points
  • Minor issues: -1 per instance

Start at 100, subtract. Nothing ships below 80.

6.9 Referee 2 Correspondence

## Referee 2 Correspondence

Status: [Not yet audited / In progress / Accepted]
Reports at: correspondence/referee2/

Critical Rule: Referee 2 NEVER modifies author code. It only reads, runs,
and creates replication scripts in code/replication/.

Where it comes from: Cunningham’s MixtapeTools Referee 2 protocol — a systematic audit system with five audits (code, cross-language replication, directory structure, output automation, econometrics).

Why it’s here: The line that matters is “Referee 2 NEVER modifies author code.” This has to be in CLAUDE.md so it’s loaded every session. If you leave it out, a Claude session running the Referee 2 audit might “helpfully” fix bugs it finds, which defeats the whole point of adversarial review. The referee reads, runs, and creates its own replication scripts. Only the author touches the author’s code.

The status tracker is just one line showing where you are in the audit process. The actual referee reports live in correspondence/referee2/ as standalone documents.

6.10 Session Startup

## Session Startup

At the start of every session:
1. Read README.md and this CLAUDE.md before doing anything else
2. Read MEMORY.md if it exists — it contains corrections from prior sessions
3. State the session goal and confirm alignment before writing code
4. Note any problems encountered — append to MEMORY.md

Where it comes from: Adapted from Cunningham’s workflow.md session protocol, combined with Sant’Anna’s [LEARN] tag system.

Why it’s here: If you don’t have a startup routine, Claude jumps straight into work without reading project context. This section makes Claude read CLAUDE.md (behavioral rules), README.md (project state), and MEMORY.md (corrections from past sessions) before writing any code. The last line is what makes it self-correcting: mistakes from today get recorded so they don’t come back tomorrow. See the Session Routines section below for the ending routine, and Tutorial 4 for the full MEMORY.md system.

Note

MEMORY.md is a convention, not a Claude Code feature. Claude Code automatically reads CLAUDE.md at session start. It does not automatically read MEMORY.md — you must instruct Claude to do so, which is what this Session Startup section accomplishes.

7 What Was Left Out (and Why)

Several sections from the source templates were left out on purpose:

Variable Definitions, Sample Restrictions, Dropped Analyses (from Cunningham): These all change as your project develops. After 20 key decisions, the decisions table alone would push CLAUDE.md past the 150-line limit. They belong in README.md. If Claude keeps getting a specific variable wrong, add a [LEARN] tag instead:

[LEARN:variables] pub_goods is school/clinic density from DHS, not government spending

Key Files (from Cunningham): File paths change as you add scripts. This is README.md territory.

Meeting Schedule (from Cunningham): Does not change Claude’s behavior.

My Skill Level (from Blattman): You know your skill level. Blattman includes this for beginners who need Claude to calibrate explanation depth. If you’re comfortable with R and the terminal, this section wastes lines.

Skills Quick Reference (from Sant’Anna): You don’t have custom skills installed yet. Add this table when you build your first skill.

Beamer Custom Environments (from Sant’Anna): You use Quarto, not Beamer.

Current Project State table (from Sant’Anna): Useful for his 6-lecture course production system, but overkill for a single research project. Status goes in README.md.

Learned Corrections section (from Sant’Anna): [LEARN] tags are noted in the Core Principles. When corrections happen, add them anywhere in the file; they don’t need a dedicated section.

8 Testing Your CLAUDE.md

After creating the file, start Claude Code in your project directory:

cd ~/Library/CloudStorage/Dropbox/my-research-project
claude

Then ask:

What do you know about me and my work?

Claude should reference:

  • Your name and institution
  • Your primary tools (R, Quarto)
  • Your estimation philosophy
  • Your folder structure conventions
  • Your quality thresholds

If it doesn’t, check:

  • The file is at the project root or in .claude/CLAUDE.md
  • The filename is exactly CLAUDE.md (case-sensitive)
  • The file is not empty

You can also ask Claude to help you fill it in. In the terminal, type something like:

“Help me fill out my CLAUDE.md. I’m a [YOUR FIELD] professor who studies [YOUR TOPIC].”

Claude will ask follow-up questions and draft sections for you.

9 Session Routines

The CLAUDE.md file works best with consistent session routines, adapted from Cunningham’s workflow.md.

Session startup. The Session Startup section in your CLAUDE.md (see above) tells Claude what to read. To trigger it, start each session with a prompt like:

Read all the markdowns. Read the key programs.
State the goals of this session.

Session ending. Before closing, preserve context for tomorrow:

Update README.md with decisions made today.
Note unresolved issues for next session.
Commit everything and push to GitHub.

Tutorial 3 covers the git side of this routine, including committing, pushing, and branching.

10 The Three Source Templates

If you want to explore the original templates this file draws from:

Source What It Emphasizes Link
Cunningham (MixtapeTools) Estimation philosophy, variable definitions, Referee 2, dropped analyses GitHub
Pedro Sant’Anna Core principles, quality gates, [LEARN] tags, ~150 line limit GitHub
Hugo Sant’Anna (Clo-Author) Worker-critic pairs, research lifecycle, submission pipeline GitHub
Blattman General template, skill levels, preferences, voice files Guide

The progression: start with Blattman’s general template if you’re new to Claude Code, add Cunningham’s research sections when you begin empirical work, and adopt Sant’Anna’s governance principles when your workflow involves multi-step tasks or quality gates. This tutorial’s file pulls from all three, tuned for a working researcher.