5 AI Work Hygiene

Using agentic tools without losing your mind

6 The Productivity Paradox

A software company called Avatra recently analyzed the digital activity of 164,000 workers across more than a thousand employers. They found that AI users spent more than double the time on email, messaging, and chat apps, while their focused, uninterrupted work fell 9% compared with non-users (Wall Street Journal, March 2026). More time on shallow tasks. Less time on deep work.

A caveat: Avatra measures digital activity — time-on-app, switching frequency — not productivity or cognitive depth. The finding is directional, not definitive. But it’s consistent with a pattern that Cal Newport has documented across decades of productivity technology: tools that promise to speed up annoying aspects of your job end up making you busier without producing more high-value output. This was true of email, mobile computing, and video conferencing. Newport calls it the digital productivity paradox (Newport, Deep Questions, 2026).

You’ve spent four tutorials learning powerful agentic tools. This tutorial is about what happens if you use them without discipline.

6.1 Trap 1: Speed Increases Throughput

When a tool makes a task faster, you don’t do the same number of tasks in less time — you do more tasks. The queue of shallow work is essentially infinite. Speed up how you handle items and more flood in.

Email is the canonical example. It made sending a message faster than calling or writing a memo. But faster messages meant more messages. Microsoft’s Work Trend Index finds that the average knowledge worker now checks an inbox once every two minutes. The task got faster; the total time spent on communication went up.

AI creates the same dynamic. You use it to draft emails, summarize meetings, generate reports — and more of each roll in behind the ones you just cleared. Higher throughput means more context switching, which exhausts cognitive capacity and degrades everything else you do that day.

6.2 Trap 2: Ease Produces “Work Slop”

When a tool reduces the effort required to produce something, quality drops — and the total work to reach a usable result often increases. The Harvard Business Review calls this work slop: “AI-generated work content that masquerades as good work but lacks the substance to meaningfully advance a given task” (HBR, 2026).

The mechanism: prompting AI for a draft takes less effort than writing from scratch. But the draft is often so generic that fixing it takes more total time than writing it properly. Imagine trading 30 minutes of hard thinking for 10 minutes of prompting plus 45 minutes of cleanup — and the result is still worse.

A concrete example: you prompt Claude to draft the opening paragraph of your literature review. It produces something plausible — correct journal names, reasonable-sounding claims. But it attributes a finding on colonial institutions to Acemoglu when the finding was actually from Nunn. It describes a regression discontinuity design as difference-in-differences. It cites a working paper that was never published. Now you’re spending an hour cross-checking every sentence — time you wouldn’t have spent if you’d written the paragraph yourself, because you already know the literature. The AI lowered the activation cost of starting; it raised the total cost of finishing.

This example illustrates a distinction worth internalizing: mechanics vs. judgment. AI can automate the mechanics of literature review — searching databases, deduplicating results, extracting methods and datasets from papers. Perhaps 50–70% of the mechanical work can be offloaded. But it cannot automate the judgment — deciding which papers matter for your argument, interpreting how findings relate to each other, identifying the gap your paper fills. That remains 0% automatable. The danger is hearing “AI helps with literature review” and inferring that half the effort is gone, when in reality the easy half is gone and the hard half — the half that determines your contribution — remains entirely yours.

7 AI Brain Fry

The productivity paradox describes what happens to your output. A 2026 study from Boston Consulting Group describes what it feels like from the inside.

Julie Bedard and colleagues at the BCG Henderson Institute surveyed 1,488 workers and introduced the concept of AI brain fry: cognitive strain and mental fatigue from excessive interaction with or oversight of AI, beyond one’s capacity to process (Bedard et al., HBR, 2026). 14% of AI-using workers reported experiencing it. People described it vividly — “feels like I have 12 browser tabs open in my head” or “I’m working so hard to manage the tools that I’m not actually doing the work.”

A methodological note: this is a cross-sectional survey with self-reported outcomes, and BCG has a commercial interest in framing AI adoption as difficult. The precise percentages deserve wide confidence intervals. But the qualitative findings describe an experience many AI users will recognize.

Key findings:

Brain fry is not burnout. Burnout is emotional exhaustion — detachment, questioning your purpose. Brain fry is cognitive — your processing capacity is overwhelmed. The study found no correlation between the two. Using AI for repetitive tasks actually reduced burnout. The problem is a specific kind of use: intensive oversight of AI outputs, iteration without stopping rules, and managing multiple tools simultaneously.

The three-tool cliff. Workers using one to three AI tools reported productivity gains; those using four or more reported the opposite. The exact threshold shouldn’t be taken literally — heavier users may have more demanding jobs. But the mechanism is plausible: each additional tool adds learning, oversight, and switching costs. Adding a tool should require the same justification as adding a step to your research pipeline: does it improve your actual output — papers published, chapters completed, results verified?

Isolation. AI work is single-player by default. You go back and forth with a chatbot, alone. The BCG data showed that when teams integrated AI into shared workflows — with defined handoffs between people — brain fry decreased. Shared use distributes cognitive load and creates natural checkpoints.

Historical precedent. When automated robots arrived on 1970s assembly lines, workers at GM’s Lordstown plant experienced alienation and disengagement — Lordstown syndrome. The resolution came not from better training on the robots but from giving workers agency over how the technology was deployed. The parallel to white-collar AI adoption is instructive: the fix is not less technology, but more human control over how the technology is used.

8 Why We Keep Falling for It: Pseudo-Productivity

The traps above are obvious once you see them. So why do smart people keep walking into them?

Newport’s answer (Slow Productivity, 2024): knowledge work has no good productivity metric. In manufacturing, you count units per worker-hour. In research, everyone has a unique portfolio of opaque tasks. In the absence of real metrics, we default to visible effort as a proxy for useful output — what Newport calls pseudo-productivity. The busier you seem, the more productive you appear. This heuristic is so internalized that solo researchers feel guilty spending an afternoon reading and thinking rather than producing visible output.

This is the missing piece. The throughput trap and work slop aren’t just accidental — they’re rewarded under pseudo-productivity because they generate visible activity. AI tools amplify this: they make it trivially easy to produce a high volume of mediocre output, which looks like productivity to anyone not measuring actual outcomes. Add social media pressure, where people perform AI fluency by bragging about running “Claude swarms while they sleep,” and the incentives to mistake busyness for output get even stronger. The fix is Strategy 1 below.

Note

Level 5 autonomous workflows. Tools now exist that let you spin up a sandboxed container, point an agent at a research task, and walk away for hours — so-called “Level 5” autonomy (Goldsmith-Pinkham, Markus Academy, 2026). Projects like ARIS (“Auto-Research-In-Sleep”) chain literature discovery, experiment execution, and cross-model peer review into overnight pipelines, claiming conference-accepted papers built with minimal human intervention. These systems are real, but they maximize output, not quality. Every claim they produce still requires human verification, and the verification cost scales with the output. For most researchers, the binding constraint is judgment, not throughput — which is exactly the constraint these tools do not relax.

9 The Congestion Problem

The mechanisms above describe what AI does to your time and attention. Cunningham adds a formal economic framework that explains why the problem gets worse as you get better at using the tools (“The Congestion Problem: When Productivity Creates Its Own Friction,” 2026).

The core insight: even if you maintain your human time at a fixed level, each additional hour of AI-assisted work yields diminishing marginal returns — early hours tackle high-value tasks, later hours refine edge cases. That is fine by itself. The problem is on the cost side. Each additional artifact AI produces — another deck, another script, another .tex table with hard-coded output — adds to a stock you must manage, search through, and keep current. The marginal cost of attention is convex: the 50th file costs more cognitive overhead than the 5th. At some point, the marginal cost of one more artifact exceeds the marginal benefit.

Cunningham calls this the congestion zone: you are involved, you are productive, but because you are productive, you generate the very clutter that degrades your ability to navigate your own project. Higher quality per item, but lower signal-to-noise ratio overall — “a library with excellent books but no catalog.”

The implication is an optimal AI intensity — a point where marginal benefit equals marginal cost. You don’t use AI for everything. You don’t use it for nothing. Past that point, each additional artifact produces more confusion than clarity. This reframes the six strategies below: they are not just discipline tips — they are ways to stay in the zone where net gains are positive. A scoreboard — a simple dashboard of your real output metrics like papers published or chapters completed (Strategy 1) — measures whether you’re in the green zone. The bottleneck analysis (Strategy 2) focuses AI where marginal benefit is highest. The stopping rules (Strategy 4) prevent you from drifting into the congestion zone. And the separation of deep and shallow work (Strategy 3) protects the attention that congestion erodes.

10 The Journal System Under Pressure

The congestion problem is not just personal. Cunningham assembled back-of-the-envelope arithmetic on the economics journal pipeline (Federal Reserve Board of Governors talk, 2026): approximately 87 journals with roughly 3,800 publication slots per year, currently receiving around 39,000 submissions from about 12,000 research-active economists. If AI tools increase individual output from 3 papers per year to 10 — and draw in new entrants who previously could not produce at submission quality — total submissions could rise 4–5 times. The 3,800 slots do not move. Desk rejection would have to rise from roughly 50% to roughly 89% just to keep referees at their current load.

The competitive threat is not to the top 5 journals — those will be fine for now. It is to the field journals (JHR, JOLE, J. Health Economics, J. Public Economics) where most economists build careers. That is where the mass of the AI quality distribution lands, and where submissions will swell first.

This is not an argument against using these tools — at the individual level, you cannot afford not to. It is an argument for using them well. The strategies below are your edge: not producing more, but producing work that survives the rising desk rejection rate because it was verified, audited, and built with the protocols from Tutorials 1–4.

11 Strategies for Researchers Using Agentic AI

The diagnosis is clear: AI tools can make you busier without making you better, exhaust your cognitive capacity, and isolate you from collaborators. Here are six strategies that address this — the first three from Newport, the rest informed by the BCG study.

11.1 Strategy 1: Use a Better Scoreboard

Measure the things that actually matter and track them over time. If a new tool doesn’t improve that score, the tool isn’t helping — regardless of how impressive individual sessions feel.

Role	What to Measure
Pre-tenure faculty	Papers published per year
Graduate student	Chapters completed, papers submitted
Research team lead	Priority projects completed per quarter
Applied researcher	Quality of empirical results, replication success rate

If you adopted a new tool last month and your output pace hasn’t changed (or has slowed), you have evidence — not just a feeling — that something isn’t working.

Note

The pseudo-productivity test. Before adopting any new AI tool or workflow, ask: will this move the scoreboard, or will it just make me look busier? If you can’t articulate how it connects to your actual output metric, it’s probably pseudo-productivity.

11.2 Strategy 2: Find the Real Bottleneck

Not all parts of your workflow constrain your output equally. Speeding up a non-bottleneck step feels productive but doesn’t increase the rate at which you produce results.

Newport uses a pointed example: social scientists who use Claude Code to generate plots in 20 minutes that would have taken 3 hours by hand. That’s a real time saving — but plot generation happens a couple of times over a paper that takes three months. It’s not the bottleneck.

He recalls a conversation with Adam Grant about what drives paper production in organizational management. Grant’s answer: getting access to interesting datasets. One good dataset from a company yields three or four papers. So Grant prioritized relationship-building and data access negotiations — not analysis speed.

This won’t generalize to every field. A computational social scientist working with public administrative data has a different bottleneck than an experimentalist who needs IRB approval. The point is that the bottleneck is almost never the mechanical step that AI speeds up most visibly. It might be:

Reading deeply enough to identify the right question
Developing the identification strategy that makes results credible
Getting access to the right data
Writing clearly enough to survive peer review

Claude Code is excellent at non-bottleneck tasks: data cleaning, code refactoring, figure formatting. That’s genuinely useful — but only if you invest the freed-up time in your actual bottleneck.

Tip

Exercise: identify your bottleneck. Think about the last paper you published. What step took longest or was hardest? Is Claude Code helping with that step, or with other steps that feel productive but don’t control your publication rate?

11.3 Strategy 3: Separate Deep from Shallow Work

Protect dedicated blocks of time for hard thinking where no AI tool is running. Confine AI-assisted work to separate blocks.

Even if your AI sessions trigger side effects — throughput traps, work slop, brain fry — those effects can’t infect the hours where you’re reading papers, developing theory, or writing prose. Separation contains the damage.

Block	Activity	AI Role
Morning (deep)	Read papers, develop theory, write prose	None or minimal
Midday (shallow)	Data cleaning, code, formatting, admin	Claude Code with defined goals
Afternoon (deep or collaborative)	Review results, discuss with co-authors, revise	Human judgment on AI work products

The specific hours don’t matter. What matters is that the deep blocks exist and are protected.

11.4 Strategy 4: Define Stopping Rules Before You Start

Brain fry is worst when workers iterate endlessly without knowing when they’re done. In the BCG study, marketing was hardest hit — 90% of skills disrupted by AI — and the core problem was the absence of natural stopping points. You can always prompt one more time.

For researchers, the same trap appears when refining a figure, iterating on a regression specification, or polishing an abstract. Each round seems slightly better, with no external signal to stop.

The fix: define success criteria before opening a Claude Code session. This is what the verification skills from Tutorial 4 do:

Methods review (Tutorial 4): five structured audits with a formal verdict (Accept / Minor / Major / Reject).
Robustness battery (Tutorial 4): classifies results as robust, sensitive, or fragile.
Quality gates (Sant’Anna’s template, discussed in Tutorial 2): a numeric scoring system where 80 means commit, 90 means ready for review, 95 means ready for submission. Not included in this tutorial’s CLAUDE.md, but a useful addition if you lack clear stopping points.

Important

The stopping rule principle. Before every Claude Code session, state what “done” looks like. Write it in your prompt: “I need a cleaned dataset with no missing values in the outcome variable and all controls logged, saved to data/clean/.” When the output meets that criterion, stop.

11.5 Strategy 5: Use AI as a Team, Not Solo

When teams integrate AI into shared workflows — with defined handoffs rather than solo use — brain fry decreases. Shared use distributes cognitive load and creates natural checkpoints.

For research teams:

Share your CLAUDE.md. Co-authors working from the same configuration produce consistent output.
Discuss AI outputs with collaborators. Don’t just accept Claude’s regression interpretation — present it to your co-author and debate it.
Use git as a collaboration tool. When your co-author can see what Claude changed (via git diff), they can evaluate the AI’s contribution.
Split AI tasks across the team. One person runs the robustness battery; another reviews the output. Less efficient per task, but lower cognitive load per person.

Cunningham describes a concrete team workflow: three co-authors join a Zoom call, one shares a screen with Claude Code running, everyone makes requests and offers interpretations, and the agent executes in real time. The power is rapid iteration — an idea at 2:00 PM becomes a figure by 2:05 PM. The danger is that after a 3-hour session, no one can remember what was found or decided (Federal Reserve talk, 2026). Five practices that prevent this:

Designate a scribe. One person (not the driver) keeps a running document of key findings, decisions, and surprising results. The scribe creates a digest, not a transcript — dead ends and debugging details get ignored.
Keep a shared document open. A Google Doc where anyone can paste the question asked, the key part of Claude’s response, and the team’s interpretation or decision.
Request periodic summaries. Every 30–45 minutes, pause and ask Claude: “Summarize what we’ve learned in the last hour.” Paste the summary into the shared document. This forces reflection and catches misunderstandings early.
Maintain a project log file. Have Claude write to a persistent session_notes.md in your project folder — date, key findings with numbers, decisions made, next steps. The file accumulates over time and is version-controlled.
End-of-session ritual. Before closing, always ask Claude for a wrap-up: what was accomplished, key results with numbers, decisions reached, and what to do next session. Save it somewhere permanent — append to the project log, email the team, or pin it in Slack.

11.6 Strategy 6: Aim AI at Drudgery, Not at Thinking

The BCG study found a clear split: AI used for repetitive tasks reduced burnout and freed time for human interaction. AI used for intensive oversight tasks — iterating on creative output, evaluating AI-generated content — caused brain fry.

For researchers:

Good AI Targets (Drudgery)	Bad AI Targets (Thinking)
Data cleaning and reshaping	Choosing your identification strategy
File formatting and conversion	Interpreting ambiguous results
Bibliography management	Deciding what question to ask
Boilerplate code (folder setup, package loading)	Reading papers for deep understanding
Generating tables from existing results	Writing the argument of your paper

Use AI for tasks where verification is mechanical — the data is clean or it isn’t, the table renders or it doesn’t. Keep the tasks where verification requires judgment for yourself. This connects to the Expertise-Verification Paradox from Tutorial 1: AI is most useful where you already have the expertise to spot errors, but using it may prevent you from developing that expertise.

Two design principles sharpen this further:

Automate processes that reduce the cost of rigor, not processes that increase output. A robustness suite that reruns your core models every time the data changes makes your paper stronger without producing more work. A pipeline that generates 300 specification variants overnight makes your paper harder to verify while producing the illusion of thoroughness. The question is not “can AI do this?” but “does automating this make my research more trustworthy or just more voluminous?”
Only automate steps where errors are cheap to detect. This is a practical filter for deciding what to delegate:

Task	Automate?	Why
PDF parsing and data extraction	Yes	Errors are obvious on inspection
Robustness reruns	Yes	Deterministic and reproducible
Citation extraction	Cautiously	Needs spot checks — hallucinations are plausible-looking
Identification strategy	No	Errors are subtle, costly, and may not surface until peer review
Theory and argument	No	Judgment-heavy; no mechanical check exists

If you cannot quickly verify whether the AI got it right, you should not be automating it — because the verification cost will eat the time you saved, and the undetected errors will be worse than the ones you would have caught writing it yourself.

12 Summary

Tutorials 1–4 taught you the tools. This tutorial is about the discipline to use them without letting them degrade your work or your well-being.

The productivity paradox is real. Tools that make tasks faster often make you busier without making you better. This has been true of every digital productivity tool since email — over 30 years and the overload has never fully resolved. AI is not exempt.
AI brain fry is distinct from burnout. It’s cognitive overload from excessive AI oversight, worst when you lack stopping rules, work solo, and juggle too many tools.
Pseudo-productivity is the trap. Visible busyness is not real output. Measure papers published, not prompts sent.
Six strategies protect you:
- Use a better scoreboard (measure what matters)
- Find the real bottleneck (speed up the constraining step, not the easy one)
- Separate deep from shallow work (protect time for hard thinking)
- Define stopping rules before you start (quality gates, not vibes)
- Use AI as a team, not solo (shared config, shared review)
- Aim AI at drudgery, not at thinking (clean data, don’t outsource judgment)
This is not just growing pains. The throughput trap didn’t resolve for email in 30 years. Individual discipline helps, but the real gains come from organizing your team and workflow around these tools — shared CLAUDE.md, shared skills, shared audit trails.

13 Sources

Cal Newport — “The Digital Productivity Paradox,” Deep Questions podcast, March 2026. Newport is a computer science professor at Georgetown and author of Deep Work (2016), A World Without Email (2021), and Slow Productivity (2024).

Julie Bedard et al. — “AI Brain Fry,” BCG Henderson Institute study of 1,488 workers, published in Harvard Business Review, March 2026.

Scott Cunningham (Baylor University) — “The Congestion Problem: When Productivity Creates Its Own Friction,” 2026; Federal Reserve Board of Governors talk, 2026. Cunningham provides the congestion framework (optimal AI intensity, convex attention costs), the journal system arithmetic, and the collaborative session management protocols cited in this tutorial.

Source	Finding
Avatra (2026)	AI doubled communication time, reduced focused work by 9% (164,000 workers)
Harvard Business Review (2026)	Defined “work slop” — AI-generated content masquerading as good work
Microsoft Work Trend Index (2025)	Average knowledge worker checks inbox once every 2 minutes
Cunningham (2026)	Congestion: convex attention costs, optimal AI intensity
Cunningham (2026)	Journal system: ~87 journals, ~3,800 slots, desk rejection _50%→89% at 5× volume