Field note•Telemetry as product surface

What looks cumulative isn't always cumulative.

In the same week, two stale-number incidents surfaced for me. One was a card on this site that kept showing my old marathon PR for two weeks after Boston. The other was a public Claude-Code usage leaderboard that had inflated about $480k of its $515k cumulative-spend total. The fixes were unrelated. The shape was the same.

Sibling note: numbers need sources ohong/straude#119

Two incidents

3:09 → 3:07

Marathon PR card on this site, stuck at the pre-Boston value because the page bundled the JSON at deploy time.

18×

Per-day inflation in the straude leaderboard's Codex collector. Per-request snapshots were being summed as session-cumulative deltas.

Stale display

2 weeks

The /running stats card kept showing the pre-Boston values until the next site deploy.

Inflation removed

≈$480k

Across all affected straude users. About 80% of the leaderboard's then-$515k cumulative-spend total.

Inflation factor

70× per session

The Codex collector treated a per-request token snapshot as a session-cumulative counter.

Surfaces hit

2 of 2

Same fault class on a static site card and on a public leaderboard. Different stacks, same shape.

Two incidents that did not look related.

One was a personal site. One was a community leaderboard. They surfaced six days apart.

Around May 1, the running stats card on the homepage of this site kept showing 3:09 / 6 / 0 of 7. By then I had run Boston, the canonical data file said 3:07 / 7 / 1 of 7, and the values on the page were almost two weeks behind the data file.

The fix was not a bug in the data. The data was right. The page used import races from "@/data/races.json", which bakes the JSON into the JavaScript bundle at deploy time. Updating the file did nothing for the live page until the next deploy. I had not deployed since the update. So the page kept showing a frozen snapshot of a data file that had moved on.

Six days later, on May 7, the maintainer of straude — a public Claude-Code usage leaderboard — opened a PR titled “fix(codex): correct cumulative-delta inflation + retroactive repair”. The trigger was a profile card showing $7,821.52 for a single day's usage, off by an order of magnitude. The investigation revealed that the Codex collector was treating total_token_usage.input_tokens as a session-cumulative counter when it is, in fact, a per-request snapshot of the current context. Each time the IDE pruned context (system-prompt re-anchoring, cache eviction, summarization), the snapshot dropped. When it grew back, the legacy delta logic added the regrowth on top of the pre-prune peak.

Per-session: 70× too high. Per-day: 18× too high. Across the leaderboard: about $480k of the project's then-$515k cumulative-spend total turned out to be inflation. Roughly 80% of the public total had never existed.

Figure 1

The snapshot rises and falls. The misread cumulative only ever rises.

The shapes look unrelated until you notice that one was built from the other. The cumulative line on the right is what the consumer reports. The snapshot trace on the left is what the producer actually emits. Same data. Different beliefs about what it means.

The shared shape.

A producer emits a number with one set of semantics. A consumer treats it with another.

The races.json file was a snapshot of the current state. The page treated it as if it were live and could ride the build pipeline forward. The Codex token field was a per-request snapshot of the conversation context. The collector treated it as if it were monotonically growing across the session.

Same gap, two stacks. A value that looks like it is accumulating, when it actually is not. The producer never promised monotonicity. The consumer assumed it. Tests checked that the value moved in the right direction, not that the semantics matched.

What makes this hard to catch is that growth is a comforting signal. A counter that goes up looks healthy. The break only shows up when something downstream — a delta, a sum, a leaderboard, a card — ends up far enough out of true that a person notices. By that point, every consumer that quietly accepted the value has been corrupted with it.

Treat “total” as an unverified claim, not a contract.

Where else this trap hides.

Once you see the shape, the same fault class is everywhere.

Linux network counters. The byte counters in /proc/net/devwrap at 32 bits on long-running interfaces. A monitoring tool that subtracts “current” from “previous” without a wraparound check will produce one giant negative number once a year.
Rate-limit remaining. X-RateLimit-Remaining resets every window. Treat it as a snapshot, not a running tally.
Auto-increment IDs. A primary key sequence that has been reseeded — for a replication restore, an environment refresh, a manual migration — produces newer rows with lower IDs than older rows. Anything using ID as a time proxy will quietly invert.
File sizes after rotation.A log rotator truncates the active file. A “disk usage growth” metric that diffs successive sizes will show negative growth and then a brand-new climb that misses everything that was rotated out.
Calendar week numbers. ISO-8601 weeks 52, 53, and 1 do not behave the way intuition expects across year boundaries. Anything joining on week-of-year as a monotonic key will mis-sort once a year.

None of these are exotic. They are common, well-documented, and well-warned-about. They keep showing up because the warning is usually on the producer's side, in a footnote, while the consumer is written somewhere else by someone who never read the footnote.

What to do about it.

The fix is not vigilance. The fix is a small contract at the boundary.

The cheap version: at every place a numeric field crosses a boundary between systems, force the producer to label it. { kind: "snapshot", value }, { kind: "cumulative", scope_key, value }, { kind: "delta", scope_key, since, value }. Refuse anything else at the consumer.

The expensive version: when the consumer assumes monotonicity, assert it. if (current < previous) throw. Either the assumption is right and the assertion never fires, or the assumption was wrong and the assertion catches the corruption two reads in instead of two weeks in.

For my site, the small fix was a runtime fetch with cache: "no-store" instead of a static import, plus a canonical-metrics.json manifest so the same number cannot appear with two different values on two different surfaces. For the leaderboard, the small fix was reinterpreting the field as a snapshot and rebuilding cumulative server-side from the snapshot stream.

Both fixes are unsexy. Both close the same shape. The shape is the lesson.

Internal reference

Promoted to skills/failure-modes/non-monotonic-source-treated-as-monotonic.md in Jenn OS so future agent sessions can recognize the shape on first contact instead of debugging it from scratch. Sibling skills: split-sync-paths, metric-drift-across-surfaces, plausible-but-wrong-numbers.