1 Agentic AI for Social Science Research

What it is, why it matters, and how to get started

2 What Is Agentic AI?

Most researchers interact with AI through a chat interface — you type a question, get a response, and the conversation stays inside that browser tab. That’s fine for brainstorming or drafting text, but it can’t touch your files, run your code, or remember your project across sessions.

An agent is different. An agent is an AI that can act: it reads files on your computer, writes code, executes it, inspects the output, fixes errors, and iterates, all within your actual project directory. Research is not a conversation. It is a workflow: data cleaning, estimation, verification, writing, revision, submission. An agent can participate in that workflow in ways a chatbot cannot.

“An AI that does things is fundamentally more useful than an AI that says things.”

— Ethan Mollick, “A Guide to Which AI to Use in the Agentic Era”

A useful framework here is the distinction between models, apps, and harnesses:

Layer	What It Is	Example
Model	The intelligence	Claude Opus 4.6, GPT-4o
App	The interface	claude.ai, ChatGPT
Harness	The system enabling autonomy	Claude Code, Codex

The same model powering a chat interface becomes far more capable when placed inside a harness that gives it tools: file access, code execution, version control, and the ability to plan multi-step workflows. A harness can also delegate: when you give Claude Code a complex task, it can spawn sub-agents — separate processes, each with its own context window — to handle subtasks like web research or file analysis in parallel. This is why complex requests take longer and why specificity matters: a vague prompt forces sub-agents to search broadly, while a precise one lets them work efficiently. This tutorial series teaches you how to use that harness for empirical social science research.

Sub-agents don’t inherit all your instructions

When Claude Code spawns a sub-agent, the sub-agent gets its own prompt — and your instructions to the main agent don’t always propagate. Goldsmith-Pinkham demonstrated this live (Markus Academy, 2026): he told Claude to stay in the current subdirectory, but a sub-agent it spawned tried to access the parent directory. Denying the permission request didn’t help — the sub-agent kept trying new approaches because the denial didn’t explain why it was forbidden. He had to explicitly type “you should not look in the parent directory” to get it to stop.

The practical lesson: if you see Claude spawning sub-agents (you’ll see “researching…” or “analyzing…” messages in the terminal), watch the permission requests. When you deny one, add a brief explanation so the sub-agent understands the constraint, not just the refusal.

3 Why This Matters for Research

Research is not product development. The difference matters for how you use AI tools:

Aspect	Product Development	Research
Goal	Ship working code	Understand correctly
Error cost	Bug in production	Wrong conclusion in paper
Iteration	Fast, fix later	Slow, careful, get it right
Testing	Unit tests, CI/CD	Cross-language replication (reimplement in a second language and compare)
Success	Does it run?	Does it mean what we think?

Most AI coding tools were built for product development: they optimize for speed and working code. Research needs different defaults. Correctness over speed. Verification over confidence. Design before results. The workflow in this tutorial series is built around those priorities.

“Claude Code is not vibe coding. It is agentic coding.”

— Scott Cunningham, MixtapeTools

“Vibe coding” means accepting what the AI writes without scrutiny. “Agentic coding” means treating the AI as a thinking partner that you interrogate, verify, and direct. You use protocols to catch errors that informal review would miss.

A concrete example: Cunningham described a text classification task to Claude Code in plain English — replicate Card et al. (PNAS, 2022), which analyzed 140 years of congressional speeches on immigration. The original paper required weeks of research assistant annotation and a fine-tuned RoBERTa model trained on 7,626 human-labeled speeches. Claude Code found the replication package on GitHub, downloaded and organized 305,000 speech segments, structured the corpus for OpenAI’s Batch API, submitted the classification job, and parsed the results. Cost: $11. Time: 2.6 hours (Cunningham, Federal Reserve Board of Governors talk, 2026). The agent handled every technical step; Cunningham handled the judgment — evaluating the 69% agreement rate, diagnosing a systematic neutral bias in the classifications, and assessing implications for downstream estimation. That is what an agent does. And it is why the rest of this tutorial series focuses on protocols that keep the researcher in control of the judgment that matters.

But if AI requires verification, who does the verifying? That tension is worth sitting with before touching any tool.

4 The Expertise-Verification Paradox

Ethan Mollick identifies a core problem with AI-assisted work:

“AI proves most useful where expertise already exists to spot errors.”

— Mollick, “15 Times to Use AI, and 5 Not To”

You need domain expertise to verify AI output. But using AI may prevent you from developing that expertise in the first place. Each delegation prevents building judgment. This is especially dangerous in research, where the stakes are not a software bug but a published conclusion.

Mollick flags three cases where AI use is actively harmful:

Learning and synthesis — AI bypasses the cognitive struggle that produces understanding
High-accuracy tasks without verification — hallucinations are confident and specific
Tasks where struggle enables breakthrough — the difficulty is the point

So what do you do? You use AI within structured protocols: cross-language replication, adversarial review in fresh terminals, formal audit trails, explicit quality gates. These become more important as AI improves, not less.

This paradox also has an ethical dimension. Before submitting any AI-assisted research, ask yourself three questions:

Would I disclose this AI use to a referee?
Could a colleague follow and verify my AI-assisted steps? (Exact replication of stochastic model output is not the standard — the logic and process should be reproducible.)
Did I verify every AI-generated claim?

If any answer is “no,” fix it first. The audit trail (git history, CLAUDE.md, auto-memory, quality reports) exists precisely so you can answer “yes” to all three. The three-question test is really a practical check on the expertise-verification paradox. It forces you to confirm that you actually understood and verified the AI’s contribution rather than trusting it passively.

AI replicates our worst habits

The verification problem is not hypothetical. The Social Catalyst Lab (Zurich) is running Project APE (Autonomous Policy Evaluation): 651 fully automated papers produced by AI agents with no human guidance — real data, real estimators, full manuscripts. These papers win 4.7% of head-to-head matches against AER-quality human papers — roughly what a mid-tier field journal paper would score against top-5 work. Not competitive with the best journals, but publishable-quality output, and improving (7.6% in the latest cohort).

The striking finding: when Cunningham extracted every coefficient and standard error from the 651 manuscripts, the t-statistics bunched at t = 1.96 — the same pattern Brodeur et al. found in human papers. The AI agents learned to p-hack. They replicate the rhetorical conventions of human researchers, including the problematic ones (Cunningham, Federal Reserve Board of Governors talk, 2026). Verification protocols are not optional extras. Without them, AI-assisted research inherits the field’s existing pathologies and adds new ones.

The tutorials that follow teach protocols that make these answers defensible: CLAUDE.md for configuration and institutional memory, git for audit trails and safe experimentation, and a skills library that bakes verification steps into the workflow.

5 Two Modes of Working with AI

So how closely should you integrate AI into your work? Mollick draws a useful distinction between two modes.

Centaur mode is a clear division of labor. You decide the argument, the identification strategy, the interpretation. The AI handles formatting, compilation, figure generation, and mechanical tasks. Lower risk, and easier to verify.

Cyborg mode is deep integration, a fluid back-and-forth where the AI brainstorms, drafts, and iterates alongside you. Higher risk, harder to separate contributions, harder to verify.

Most of this tutorial series teaches Centaur mode. The protocols (CLAUDE.md, skills, adversarial review, cross-language replication) are about keeping human judgment and AI execution separate. As you get comfortable with the tools and verification protocols, you can start moving toward Cyborg mode for specific tasks like exploratory data analysis or brainstorming research questions.

The formal version: isoquants and the danger zone

Cunningham formalizes this tradeoff using production theory (Federal Reserve Board of Governors talk, 2026). Think of research output Q as a function of human time H and machine time M. Pre-AI, the isoquants are quasi-concave — you always use some of both, and the optimal mix is an interior solution. Post-AI, if human and machine time become perfect substitutes, the isoquants are linear and the cost-minimizing solution is a corner: all machine time, no human time.

The problem is that the production function is concave in H. If you reduce human time below some threshold, output falls below the pre-AI level — a “danger zone” where a productivity-enhancing technology reduces actual output because the behavioral response overwhelms the technology gain. In plain terms: if you delegate so much to AI that you stop thinking deeply about your research, the work gets worse even though the tools got better. The protocols in this tutorial series are designed to keep you in the interior, not at the corner.

6 The Research Workflow

Now for the actual process. The workflow combines Cunningham’s five-step research protocol with Blattman’s iterative PPRR loop into a single framework:

Dialogue → Code → Replicate → Verify → Document

Each step follows the same internal rhythm: Prompt → Plan → Review → Revise. You don’t fire off a single request and accept the result. You iterate within each step until the output meets your standards.

Dialogue — start with an observation or question, not a code request. Discuss the substance before asking for implementation. Brain-dump your idea, have Claude structure it into an actionable plan, then stress-test the plan before committing to code.
Code — implement in your primary language (R, Python). Use structured prompts: context, task, format, constraints. Review Claude’s plan before it writes, then review the code before running it.
Replicate — implement key analyses in a second language. Compare to 6 decimal places for linear models; set tolerance upfront for nonlinear. This is the strongest guard against silent errors.
Verify — run adversarial review in a fresh terminal (so Claude can’t see its own earlier work). Five audits, formal report, iterate until “Accept.” This is the methods-review protocol (based on Cunningham’s Referee 2 concept).
Document — commit with meaningful messages, update CLAUDE.md, maintain audit trails. If it’s not documented, it didn’t happen.

AI works best as iteration, not a single query.

7 The Tutorial Series

This tutorial series walks through a complete research workflow using Claude Code, from initial setup through journal submission. Each tutorial is a standalone HTML document you can read and follow independently, but they build on each other:

Tutorial	Topic	What You’ll Learn
1 (this one)	Introduction	What agentic AI is, why it matters, installation
2	CLAUDE.md	Configuring Claude Code for research (the amnesia solution)
3	Git	Version control for AI-assisted research workflows
4	Skills	A skills library for the full publication pipeline
5	AI Work Hygiene	Using agentic tools without losing your mind
6	Large Datasets	Working with data that doesn’t fit in memory

The tutorials use an accompanying popescu_claude/ folder containing a working skills library and project configuration (CLAUDE.md). This folder is designed to be shared. You can copy the skills into your own project and customize the configuration for your profile.

8 Installing Claude Code

Enough concepts. Let’s get Claude Code running on your machine.

8.1 Prerequisites

You need a computer (macOS, Linux, or Windows), an internet connection, and a paid Claude subscription or API key. The free plan does not include Claude Code. For current pricing, see claude.ai/pricing.

8.1.1 Required

Tool	Purpose	Install
Claude Code	The agentic coding harness	See Step 1 below
Claude subscription	Pays for Claude Code usage	Claude Pro, Max, Teams, or Enterprise, or an API key from the Claude Console
Git	Version control and audit trail	`brew install git` (macOS) or `apt install git` (Linux) — see Tutorial 3

8.1.2 Recommended

Tool	Purpose	Install
R	Primary analysis and visualization	r-project.org
Quarto	Slide decks, documents, website	quarto.org
GitHub CLI	GitHub integration (push, pull, create repos)	`brew install gh` (macOS) or see cli.github.com

8.1.3 Optional

Tool	Purpose	Install
Python	Cross-language replication	python.org or `brew install python`
LaTeX	Paper compilation	TeX Live or MacTeX

Note

You don’t need everything on day one. Claude Code (required) is all you need for Tutorials 1 and 2. Git (required) is introduced in Tutorial 3. R, Quarto, and the optional tools become relevant in Tutorial 4 when you start using skills. Install them as you go.

8.2 Step 1: Install Claude Code

Open a terminal and run the installer for your operating system:

macOS or Linux:

curl -fsSL https://claude.ai/install.sh | bash

Windows (PowerShell):

irm https://claude.ai/install.ps1 | iex

The native installer requires no dependencies and auto-updates in the background.

Note

Alternative: npm installation. If you prefer to install via npm (e.g., for version pinning), you can run npm install -g @anthropic-ai/claude-code. This requires Node.js to be installed first. The npm method still works but is officially deprecated by Anthropic in favor of the native installer.

8.3 Step 2: First Run

Navigate to a project directory and start Claude Code:

cd ~/Dropbox/my-research-project
claude

Your browser will open for Anthropic authentication. Approve access, return to the terminal, and Claude Code is running. Try asking it something about your files:

What files are in my current folder?

Claude can now see your actual project. Type /exit to end the session.

Before you start working in earnest, it helps to understand how Claude Code handles permissions and data privacy.

8.4 Permissions

Claude Code asks for permission before running commands, editing files, or accessing the web. You can approve each action individually, or grant blanket permission to speed things up:

“Yes” — approve this one action
“Yes, and don’t ask again” — approve this type of action for the rest of the session (e.g., all web searches, or all Python script executions)

For a data analysis session, you might say “yes, you can always use web searches” and “yes, you can run all the R scripts you need.” This avoids approving every single action and lets Claude work while you step away. Permissions are scoped to the current session and project.

Important

Always navigate to your project folder before running claude. Claude Code sees whatever directory you’re in when you launch it. If you start it from your home directory, it sees everything. If you start it from your project folder, it sees your project and reads your CLAUDE.md automatically.

Warning

Your files are sent to Anthropic’s servers. Claude Code reads files in your project directory and sends their contents to Anthropic as part of each API request. The conversation doesn’t persist server-side, but file contents become part of the request. A safe heuristic: don’t expose data you wouldn’t upload to Dropbox. Specifically exclude IRB-protected research data, personally identifiable information (PII), HIPAA-regulated medical records, and API keys or passwords.

8.5 Choosing a Model

Claude Code defaults to Sonnet on Pro and Team Standard plans and Opus on Max and Team Premium plans. For research work (analysis, writing, review), Opus is the better choice:

/model

Select Claude Opus from the list. Opus is slower and more expensive but noticeably better at reasoning, nuanced interpretation, and longer output. Use Sonnet for mechanical tasks (reformatting, file operations, simple code edits) and Opus for anything requiring judgment.

8.6 Useful Commands

Action	How
Send a message	Type and press Enter
Multi-line message	Press `\` then Enter (or Shift+Enter)
Queue messages while Claude works	Just type — they’ll be sent when it finishes (press Up to edit a queued message)
Scroll history	Up/Down arrow keys
Switch model	`/model`
Get help	`/help`
Exit	`/exit` or Ctrl + C
Compact context	`/compact`
Interrupt / revert	Esc (interrupt current action; press again to revert). Type “please continue” to resume.
See Claude’s thinking	Ctrl + O (toggle internal reasoning view while Claude works)
Reference a file	Type `@filename` in your prompt

Note

Managing long sessions. Claude Code automatically compresses earlier conversation as the context window fills. For very long research sessions, you can manually run /compact to summarize the conversation so far and free up space. If a session becomes sluggish or Claude starts forgetting earlier instructions, this usually helps.

Warning

Compaction is lossy. When Claude compacts a conversation, whether automatically or via /compact, it summarizes the history but drops specific details: which approaches failed, constraints you stated three prompts ago, a reviewer’s formatting preferences you mentioned in passing. This is one more reason to keep sessions short, commit often, and put important constraints in CLAUDE.md rather than relying on conversation history. CLAUDE.md and auto-memory are re-loaded at the start of every session and are immune to compaction.

Note

Code runs on your machine. Claude Code executes scripts locally — your hardware, not a cloud server. Running Python or R code requires having those languages installed, and any files created or modified are real files in your project directory. The distinction: your code runs locally, but the files Claude reads are sent to Anthropic’s servers as part of each API request (see the data privacy note above).

Claude Code vs. Cowork

Cowork is the web-based sandboxed alternative bundled with Claude’s desktop app. Both use the same underlying model, but the environments differ:

Feature	Claude Code	Cowork
Interface	Terminal	Web browser
File access	Full filesystem	Sandboxed
Network	Unrestricted	IP-restricted
Best for	Full projects	Quick prototyping

Cowork’s restrictions are deliberate safety boundaries, not capability limits. Use Claude Code for real project work; use Cowork for quick experiments you don’t want touching your filesystem.

ChatGPT vs. Claude: which should I use?

You don’t have to choose one exclusively. Each has strengths: Claude Code is the best current agentic harness for working inside your project directory; Claude is strong on long documents and nuanced reasoning; ChatGPT’s Deep Research mode is good for literature surveys; and ChatGPT handles image generation, which Claude does not. If you had to pick one tool for research, Claude with Claude Code is the better option right now, but there’s no reason not to use both. Note that OpenAI’s Codex uses AGENTS.md the way Claude Code uses CLAUDE.md — the concept is converging across platforms.

9 Further Reading

If you want to go deeper into the sources behind this tutorial series:

Scott Cunningham (Baylor University) — MixtapeTools. The original research-focused Claude Code workflow: estimation philosophy, the read-paper protocol, cross-language replication as default verification, the adversarial methods audit, and a presentation philosophy built on the “Three Laws.” His workflow.md is the fullest statement of the “agentic coding” philosophy for empirical economics.

Pedro Sant’Anna (Emory University) — claude-code-my-workflow. The “project constitution” approach to CLAUDE.md: core principles, quality gates, and the ~150 line limit.

Hugo Sant’Anna (University of Alabama at Birmingham) — Clo-Author. A full research pipeline with 10 slash commands (/discover, /strategize, /analyze, /write, /review, /submit), worker-critic agent pairs, and an orchestrator loop with dependency graphs. Documentation at hsantanna.org/clo-author.

Chris Blattman (University of Chicago) — claudeblattman.com. The friendliest guide to Claude Code for social scientists: installation walkthroughs, prompt engineering advice, the PPRR loop, voice files for teaching AI your writing style, and the “embarrassment heuristic” for deciding when AI use is appropriate.

Aniket Panjwani — AI MBA. Practical training on skills, MCPs, and Claude Code workflows. Covers the skill-building process, skills-vs-MCPs comparison, git as the foundation of AI-assisted work, and context window management.

Paul Goldsmith-Pinkham (Yale University) — Claude Code for Economists (Markus Academy, 2026). An ongoing video series (four parts and counting) covering getting started, data visualization, text-as-data pipelines, and large-scale data engineering. Practical demonstrations of permissions, sub-agents, planning mode, and the research-plan-execute workflow. Accompanying essays on his substack A Causal Affair.

Anthropic — Claude Code documentation. The technical reference for skills, CLAUDE.md, and Claude Code configuration.