CASE STUDYApr 2026

Provenance Theater: When the Safety Net Is Made of Paper

Six data explorers had provenance labels, round-number detectors, export guards, and a full DATA_VALIDATOR. Every CFPB complaint count was still fabricated. The infrastructure for rigor is not rigor. Calling the API is rigor.

data-qualityprovenanceai-toolsverificationfailure-mode

6

explorers audited

BLS, FRED, Census, EDGAR, BTS, CFPB

2.3x

CFPB error magnitude

Wells Fargo 2018: claimed 19,843 → actual 8,789

3

skills that already existed

data-smoothing, provenance, testifying-expert

0

API calls in hardening pass

labels added, data never checked

The safety infrastructure was real — it just did not do the one thing that matters

What looked right

embedded-unverified

Provenance labels

Every data object was tagged with a dataMode property the moment it entered the system. The tagging was correct. The data was not.

Amber pills

Colored badges

Users could see which data was unverified. The UI enforced the distinction. The distinction was between two kinds of wrong.

Blocked on unverified

Export guards

Exports were disabled when embedded data was selected. This prevented bad data from reaching deliverables. It did not prevent bad data from reaching the screen.

6 heuristic tests

DATA_VALIDATOR

Round-number detection, monotonic trend flags, volatility checks, source registry validation. The fabricated CFPB data passed every single one.

The data was checked for plausibility, never for truth

What was actually wrong

Off by 2.3x

CFPB: entirely fabricated

All 8 companies had invented annual counts, product breakdowns, and state distributions. Navient showed 7 years of declining complaints. Real count: zero across all years.

$10B gap

EDGAR: Exxon revenue wrong

FY2023 listed as $334.7B. SEC EDGAR shows $344.6B. FY2024 also wrong. The SEC has a free XBRL API. Nobody called it.

0.1–0.8 off

FRED: subtle drift

UNRATE 2022 off by 0.1 (3.6 vs 3.7). Oil prices off on 5 of 10 annual values. Close enough to look right, wrong enough to undermine trust.

Verification artifacts, not more labels

The structural fix

Data from APIs, not agents

Pipeline scripts

A script fetches from the real API, outputs a dated JSON file. The HTML embeds from that file. The agent never types individual data values.

Show the curl

Verification receipts

Every embedded value must trace to a specific API call. The receipt is the artifact: the URL, the response, the extraction. No receipt, no data.

Honest gaps

Empty over plausible

For sources without APIs (BTS fares), cells stay empty with a link to the source. Editable so a human can enter verified values. Never generate placeholder data.

I built six data explorers for FTI research tools — BLS, FRED, Census, EDGAR, BTS, CFPB. Each one lets a consultant query government data without leaving the browser. They looked polished. They had provenance labels, export guards, round-number detectors, and a full DATA_VALIDATOR that ran six heuristic tests on every embedded dataset.

Then I audited the data against the actual APIs.

The CFPB explorer claimed Wells Fargo received 19,843 complaints in 2018. The real number from the CFPB API is 8,789. Off by 2.3x. Every single company in that explorer had fabricated annual counts, fabricated product breakdowns, and fabricated state distributions. Navient had seven years of declining complaints plotted on a convincing downward curve. The real number? Zero complaints under that entity name across all seven years. A complete fiction with a narrative arc.

The EDGAR explorer listed Exxon FY2023 revenue at $334.7 billion. The SEC’s XBRL API returns $344.6 billion. A $10 billion gap. The FRED explorer had oil prices off on 5 of 10 annual values and unemployment wrong by a tenth of a point in 2022. BLS and Census were the only two that checked out — BLS because it fetches live, Census because the embedded data happened to match.

The worst part is not that the data was wrong. The worst part is that a full hardening pass had been applied — provenance labels, colored badges, export blockers, validator code — and none of it caught anything. Because none of it called the APIs. The hardening pass built the infrastructure for verification without doing verification. It labeled every value ‘embedded-unverified’ and moved on, as if the label were the fix.

I already had three skills that should have prevented this: ai-data-smoothing (the fisheries incident), data-provenance-standalone (the tag-badge-guard pattern), and testifying-expert (‘could I defend this on the stand?’). The rules existed. The enforcement did not.

The lesson is the same one I keep relearning: the infrastructure for rigor is not rigor. A provenance label on fabricated data is worse than no label at all, because it creates the appearance of a system that has been checked. The only thing that counts is calling the API and showing the receipt. Everything else is theater.

BEFORE (provenance theater)
  Agent generates plausible data
  -> tags it 'embedded-unverified'
  -> adds amber badge to UI
  -> blocks export until verified
  -> DATA_VALIDATOR runs 6 heuristic tests
  -> all tests pass
  -> data is still fabricated

AFTER (verification artifacts)
  Script calls real API
  -> writes dated JSON with fetch metadata
  -> HTML embeds from JSON file
  -> agent never types data values
  -> receipt exists for every number

THE RULE
No data ships without a verification artifact.
"I tagged it unverified" is not a verification artifact.
It is an admission that you skipped verification.

Failure mode skill Companion: metrics need sources CFPB API SEC EDGAR XBRL API