AI at Work: What the Evidence Actually Shows

The Question

Is AI making the economy more productive — or are we just spending $700 billion to install electric motors in factories that haven’t redesigned their floor plans?

Key Numbers

86%

Execs: no impact

Fed Atlanta / BoE / Bundesbank / Macquarie

41%

Workers using AI

Bick et al., Fed St. Louis, Nov 2025

5.7%

Of work hours

Share involving generative AI

$700B

Tech capex 2026

Big 5 US tech companies

1.9%

Productivity growth

2025, below 2% long-run avg (BEA)

~90%

GDP growth from building AI

Not from using it (Furman, Harvard)

The Claim

AI adoption does not equal AI productivity. The bottleneck is organizational, not technical. Task-level gains are real and well-documented (12–40% improvement in controlled studies).^[5]^[6]^[7] But they shrink to near zero at the macro level because AI touches only 5.7% of work hours,^[8] most organizations haven’t redesigned workflows around it, and not all saved time is productively redeployed. 86% of executives across four countries report zero measurable productivity impact.^[2]

We are in the electric-motor-installed-but-floor-plan-unchanged phase.^[1] Personal computers didn’t boost productivity for years after widespread adoption. The real gains came when firms reorganized — new logistics, new business models, new roles. Much of the 1990s productivity boom came from retail (inventory management, supply chains), not Silicon Valley itself.

What would change my mind

Sustained quarterly productivity growth above 2.5% for 4+ quarters, driven by sectors using AI (not building it). The SF Fed’s underlying productivity measure (excluding investment-driven output) turning meaningfully positive. A “retail moment” where a non-tech industry shows macro-visible gains from AI-redesigned workflows.

Figure 1

The Adoption-Productivity Disconnect (2025)

41% of US workers use generative AI, but only 5.7% of total work hours involve it. 86% of executives across four countries report zero measurable productivity impact over three years. Productivity growth in 2025 was 1.9% — below the 2% long-run average.

Figure 2

Task-Level Gains vs Macro Impact (% improvement)

Individual task gains (12–40%) are real and well-documented. But they shrink dramatically at the macro level (0.25–0.5pp) because AI touches only 5.7% of work hours, and not all saved time is productively redeployed.

Where AI Works: The Closed Problem Framework

Two occupations show real traction: coding and customer service. Three factors link them (Vaziri, Gartner):^[4] context-light tasks, easily verifiable output, and abundant training data.

Two-thirds of coders use AI weekly (Stack Overflow). GitHub Copilot has 26 million users. 85% of customer-service managers planned to experiment with AI in 2025 (Gartner). Teleperformance stock is down 75% since ChatGPT launched. One-third of Anthropic Claude queries are coding-related.

The pattern extends to cybersecurity: XBOW’s AI pen-testing swarm reached #1 on the world leaderboard, completing in hours what took weeks. It showed unexpected creativity — exfiltrating data from an image-only system by generating trick images. “Hallucination is an advantage — it does unexpected things. But there’s a crisp success criterion” (Oege de Moor). And to law: Garfield, the first regulated AI law firm in the UK, handles small claims because they’re a “chronological sequence of relatively closed problem spaces.” Complex litigation would fail.

Next targets: junior bankers and junior lawyers — less numerous but high-paid, context-light early-career tasks.

Figure 3

The Closed Problem Framework: Where AI Works and Why

Three factors predict AI success (Vaziri, Gartner): context-light tasks, easily verifiable output, and abundant training data. Coding and customer service score high on all three. Strategy, creative work, and relationship management score low. The near future is expanding what counts as “closed.”

Boss Class Season 3: Six Episodes on AI at Work

Andrew Palmer — The Economist’s Bartleby columnist — spent Season 3 of Boss Class testing AI against his own work, interviewing the people building it, and confronting what it means for management. Each episode isolates a different dimension of the same question: can AI do your job?

S3:E1Boss Class S3

The Fat Layer of Humans

Palmer tests AI on his own Bartleby columns and gets a nasty shock.

"Right now there's a very fat layer of humans. My argument is that layer gets thinner and thinner. And does it ever go to zero? I don't see why not." — Tom Blomfield, Y Combinator

S3:E2Boss Class S3

Why Coding Leads the Frontier

Vibe coding and the democratization of software. "If even Andrew Palmer can write his own code, is this good news or a disaster?"

Coding leads because (1) AI engineers are themselves coders — they understand the domain, and (2) code is verifiable — test suites provide tight feedback loops. — Sarah Guo, Conviction VC

S3:E3Boss Class S3

The Easy Button Problem

Workplace AI is littered with disappointments. Some bosses figured out how to make it work.

"People who are going to be the most successful are the people that can resist just hitting the easy button." People present AI slides, get asked questions, can't answer — they didn't do the work. — Hilary Gridley, Whoop

S3:E4Boss Class S3

Gen AI vs Gen Z

AI threatens the rung of the ladder juniors use to learn. What replaces it?

"Your talent pipeline ended this summer." If senior people retire, there's no one to take over. Juniors did the grunt work that taught them how the business thinks. AI broke that deal. — Ethan Mollick, Wharton

S3:E5Boss Class S3

Closed Problem Spaces

Where AI excels and where it still falls apart.

AI legal services work because small claims are "a chronological sequence of relatively closed problem spaces." Defined inputs, finite possibilities. Complex litigation would fail. — Garfield, first regulated AI law firm (UK)

S3:E6Boss Class S3

The Human Defence

What's your unfair advantage over AI?

Lab experiment: people good at managing humans were also good at managing AI agents. The skills are highly correlated. Your edge isn't being smarter — it's being trusted, contextual, and accountable. — David Deming, Harvard

BonusBoss Class S3

Your Questions on AI at Work

Gridley and Mollick answer listener questions. Key insights:

"Swiss cheese prompting" — layer validation checks. No single layer catches everything, but layers together reduce errors.

"Codify yourself" — Gridley built dozens of custom GPTs encoding her own judgment. Team gets 24/7 access to her standards.

Project managers are "at the most superpowered moment of their career" — AI agents respond well to spec documents and SOPs (Mollick).

The burnout loop: AI expands both what you do AND what you're capable of. The gap grows, not shrinks.

Five Failure Modes

From real enterprise AI adoption projects. These are not hypothetical — each one comes with a named company or named researcher who documented it.

The Exposure Trap

Showing '40% of tasks are AI-exposed' without telling anyone what to DO about it.

Percentages without actions. Managers feel threatened; workers check out.

The Validation Gap

AI produces output fast. Nobody checks it.

Deloitte Australia issued a partial refund to the federal government for a report littered with AI-generated errors (2025).

The Speed-Trust Tradeoff

Delivering AI work in a 'big reveal' kills credibility.

Glowforge's AI sales coach emailed summaries to reps — 'every single sales rep had routed it directly into the bin.' Only worked after redesign. (Dan Shapiro, CEO)

The Language Mismatch

Technical teams write methodology docs. Managers don't read methodology docs.

"Here's a bunch of wood, build a house" — Bret Taylor (OpenAI chair) on the current state of AI vendors.

The Power Law

85% of J&J's AI value from just 15% of applications.

The 'thousand flowers' phase produces mostly weeds. J&J switched to a central AI council. (Jim Swanson, CIO)

The Hidden Costs

Two costs beyond the model subscription: adaptation cost and human-in-the-loop cost.

"Humans in the loop are expensive. They inconveniently get in the way of the scalability story." — Rama Ramakrishnan, MIT

Figure 4

The 85/15 Power Law (J&J AI Portfolio)

The “thousand flowers” phase produces mostly weeds. J&J found 85% of AI value came from just 15% of applications. Implication: don’t democratize experiments without a filtering mechanism.

The Human Defence

David Deming (Harvard)^[8] identifies three human edges that AI doesn’t touch:

Trust & Accountability

People don't trust AI-driven processes as much as humans. In the Zapier experiment, employees rated identical answers lower when they thought it was AI — even when it was the real CEO.

Taste & Judgment

AI optimizes for what you tell it. It doesn't know what to optimize for. "What AI isn't good at is figuring out what it's supposed to optimize over." — Deming

Versatility

"We use our intelligence all the time, but there are very few times when we only use that. It's usually in combination — physical skill, relationships, mentorship." — Deming

The managing-AI-is-managing-people finding: Deming’s lab experiment^[3] randomly assigned people as managers of human teams, then AI agent teams. The people who were good at managing humans were also good at managing AI agents. The skills are highly correlated. Invest in management development, not “AI training” as a separate skill.

The chess analogy: Deep Blue beat Kasparov 30 years ago. Chess didn’t die — it’s more popular than ever. Engines changed how the game is taught: kids study 3x faster. But the shift in value went from memorization to reading opponents and creative play. The biggest risk wasn’t displacement. It was cheating. Same for enterprise AI.

The Entry-Level Question: Gen AI vs Gen Z

“Your talent pipeline ended this summer.” — Ethan Mollick.^[3] If AI replaces the entry-level tasks that junior employees used to learn on, you lose the training pipeline. The people who become senior leaders in 10 years are learning on those tasks right now.

“Every mid-level manager has realized AI does a better job at writing the first draft deal memo and does it right away. And never cries.” Meanwhile, “every intern has realized ChatGPT will make them look smart. And they’d be stupid not to use it. So now it’s people passing AI content back and forth and no one’s learning anything.” — Mollick

But cutting junior hiring is the wrong response. 80%+ of Gen Z use AI at work (LSE).^[9] They’re the AI-native workforce. Historical precedent: “A lot of companies didn’t fully enter the digital economy until a lot of their workforce had turned over.” (Calhoon, Indeed). The answer is redesign apprenticeships, not kill them. Create hybrid roles: legal tech partner, client innovation partner, legal coder. Be transparent about AI in recruitment — Shoosmiths explicitly embraced it and saw only a 10% application increase vs 40-50% elsewhere.

Figure 5

AI Usage at Work by Generation

“Think of Gen Z as Gen AI” — Challamel. Juniors are the AI-native workforce. Cutting entry-level hiring eliminates the people who know the tools best.

The Capital Picture

Big 5 US tech companies will spend $700 billion in capex in 2026 — more than the entire global oil & gas sector ($570B).^[10] GPUs are being financialized: OneChronos is launching a compute auction market (June 2026, partnered with Paul Milgrom’s Auctionomics), Ornn is building a GPU price index with put options. Morgan Stanley estimates $680B in GPU depreciation over 4 years for Alphabet, Microsoft, Meta, and Oracle.

India generates roughly 20% of the world’s digital data but hosts only 3% of data centre capacity.^[11] The investment wave is massive: Adani ($100B by 2035), Alphabet ($15B), Microsoft ($20B). Maharashtra is offering 40% lower electricity rates for data centres, and the 2026 Budget includes a tax holiday for foreign DC owners until 2047.

The disconnect: The economy is financializing AI infrastructure before it’s clear AI makes the economy more productive. Same pattern as housing pre-2008 — securitizing the asset before understanding the underlying value.

So What?

If the bottleneck is organizational, the prescription is organizational. Not more AI tools. Not bigger models. Workflow redesign. Management development. Honest uncertainty about what’s coming instead of false confidence about what’s here.

Set business outcomes, not AI usage targets. Sierra charges clients only when the AI agent actually solves the problem (Bret Taylor).

Start with the pain, not the tool. Whoop went to lowest-adoption departments and watched what people hated doing (Gridley).

Expect the 85/15 power law. Build a filtering mechanism early (J&J).

Embed AI into existing workflows, not alongside them. The Glowforge lesson: a superior product that doesn't fit someone's day gets binned.

Follow the 5-step sequence: Access → Adoption → Proficiency → Ways of Working → Reorganization (Challamel, ex-Moderna → OpenAI).

Use the PURE framework for governance: Purposeful, Unsurprising, Respectful, Explainable. DBS Bank: $775M in AI gains, 12-year journey, 10,000+ reskilled.

Sources

[1]The Economist. “The AI productivity boom is not here (yet).” Finance & Economics, Feb 22 2026.Primary source for macro productivity data. Bick et al., Furman, SF Fed, Yotzov (BoE). The electric motor analogy.

[2]The Economist. “How to avoid common AI pitfalls in the workplace.” Briefing, Jan 29 2026.86% of executives report zero measurable productivity impact. Pizza Hut, J&J 85/15, Glowforge, DBS PURE framework, Ramakrishnan hidden costs.

[3]Boss Class Season 3, Episodes 1–6 + Bonus. The Economist Podcasts, Jan–Feb 2026.Andrew Palmer (Bartleby). Blomfield, Guo, Gridley, Mollick, Deming, Krieger, de Moor, Garfield, Challamel, Foster, Baker.

[4]The Economist. “Lessons from the frontiers of AI adoption.” Business, Dec 1 2025.Vaziri/Gartner three-factor framework: context-light tasks, verifiable output, abundant training data.

[5]Noy & Zhang. “Experimental Evidence on the Productivity Effects of Generative AI.” Science, 2023.MIT experiment: 40% reduction in writing task completion time. Gold-standard RCT for task-level gains.

[6]Dell’Acqua et al. “Navigating the Jagged Technological Frontier.” Harvard Business School, 2023.BCG consultants: 12–25% improvement on realistic tasks. Coined the “jagged frontier” concept.

[7]del Rio-Chanona et al. “Are LLMs Any Good for High-Stakes Recommendations?” UCL / Oxford, 2024.Broader review confirming 15–30% productivity gains in real-world settings, not lab conditions.

[8]Bick, Blandin & Deming. “The Rapid Adoption of Generative AI.” NBER Working Paper 33744, 2025.41% adoption, 5.7% of work hours, 13% daily users. The definitive US adoption survey. Also: Deming on human edges — trust, taste, versatility.

[9]LSE Centre for Economic Performance. “AI and the Labour Market.” CEP Discussion Paper, 2024.80%+ of Gen Z use AI at work. Via Challamel (OpenAI): “Think of Gen Z as Gen AI.”

[10]The Economist. “The financialisation of AI is just beginning.” Buttonwood, Feb 17 2026.$700B capex, GPU financialization, OneChronos/Auctionomics, Morgan Stanley depreciation estimates.

[11]The Economist. “India is in the midst of a data-centre investment boom.” Business, Feb 19 2026.India: 20% of global data, 3% of data centre capacity. Adani $100B, Alphabet $15B, Microsoft $20B.

The Fat Layer of Humans Gets Thinner

Andrew Palmer tested AI on his own Bartleby columns and got a nasty shock — the gap between his output and the machine's was smaller than he expected. The 'fat layer' of humans doing knowledge work that AI can replicate is thinner than anyone admits.

Field Note

Why Coding Leads the AI Frontier

Coding is the canary. It's the first knowledge domain where AI consistently matches median practitioners — because code has objective tests, tight feedback loops, and clear success criteria. Domains without those properties are harder to automate but also harder to evaluate.

Field Note

The Easy Button Problem

Most workplace AI deployments fail not because the technology doesn't work, but because managers treat it as an easy button — deploying it without redesigning workflows, updating evaluation, or investing in prompt literacy. The bottleneck is organizational, not technical.

Field Note

Your Talent Pipeline Ended This Summer

If AI replaces the entry-level tasks that junior employees used to learn on, you lose the training pipeline. The people who become senior leaders in 10 years are learning on those tasks right now. Automating the rung doesn't just cut costs — it cuts the ladder.

Field Note

Closed Problem Spaces

AI excels in closed problem spaces — defined inputs, verifiable outputs, tight feedback loops. Open problem spaces (strategy, ethics, novel design) remain stubbornly human. The near future of AI is expanding what counts as 'closed,' not conquering what's 'open.'

Field Note

Managing AI = Managing People

Your unfair advantage over AI isn't being smarter — it's being trusted, contextual, and accountable. The skills that make someone a good manager (reading a room, navigating ambiguity, building trust) are precisely the skills AI can't replicate. Managing AI well requires the same judgment as managing people well.

AIproductivityautomationmanagementBoss-ClasscodingGen-Zadoption