Navigate
HomeStart here
MusingsResearch & long-form
BuildingProjects & learnings
WorkProfessional practice
RunningTraining & races
AboutValues & identity
Life & PlacesCulture, food, travel, cities
Notes & ArchiveJournals, essays, portfolio

Methodology

Make the pipeline legible and the maintenance work repeatable.

The best way to keep the visuals impressive is to make the underlying process boringly consistent. This page explains what the tracker stores, how winners are derived, and what a high-confidence maintenance loop should look like.

87%

tracked coverage

Measured against the working manual catalog count.

5%

reviewed coverage

0 rows are verified today.

5%

evidence coverage

Future reruns should attach proof by default.

0.92

average confidence

Useful for triaging manual review order.

Model

What the tracker currently treats as canonical

The rules below are what keep the shared tracker library stable across pages.

Responses are the source of truth

Winners are derived from `responses.correct`, not trusted from stale `winner` labels.

Ties and stumpers are explicit

A tie means multiple correct answers. A stumper means nobody got it right.

Identity gaps stay visible

Known historical panelists and guest-name gaps are surfaced as QA issues instead of being erased from the story.

Milestones beat fake certainty

Until every appearance is modeled, the host timeline uses evidence-backed snapshots rather than guessed era boundaries.

Date-only reruns are not precise enough

Late-2023 and early-2024 Friday releases include same-day bonus episodes, so maintenance commands should target the episode title as well as the date.

Some long episodes need a longer tail

If a known quiz episode comes back with no quiz found, rerun it with a larger `--tail-minutes` window before treating the result as authoritative.

Pipeline

The transcript process is a small toolchain, not one magic model call.

The point of naming every tool is portability: if this process moves to Money Talks, another podcast, or a private research dataset, the same chain can be audited step by step.

Source catalog

  • Economist/Acast RSS and subscriber catalog rows provide episode titles, dates, descriptions, and audio URLs.
  • A Playwright browser login is only used when subscriber catalog tokens expire and the catalog needs to be refreshed.

Local processing

  • Node.js scripts in `scripts/podcast-series` orchestrate catalog selection, resume logic, and batch writes.
  • `curl` downloads the audio; `ffprobe` checks duration; `ffmpeg` trims the last N minutes where the recurring segment normally lives.

AI extraction

  • OpenAI Whisper (`whisper-1`) transcribes the clipped audio window.
  • OpenAI GPT-4o converts the transcript into structured quiz JSON: question, answer, guesses, correctness, confidence, and notes.

Storage and review

  • Batch JSON lands under `scripts/podcast-series/economist/checks-and-balance/batches` before being merged into `data/economist-podcasts.ts`.
  • TypeScript tracker utilities, the QA audit script, and the site pages turn those rows into visible data products and review queues.

Connector boundary

  • Transcript processing does not use Gmail, Calendar, Stripe, Vercel connectors, or private inbox data.
  • OpenAI is the model provider; the browser is only a catalog-refresh tool, not a hidden source of scoring truth.

Workflow

The maintenance loop now has one intended path.

These are the commands that should be reused for reruns and merges so corrections replace stale rows instead of quietly getting skipped.

Step 1
node scripts/podcast-series/economist/checks-and-balance/extract.mjs --catalog --include-tracked --from 2024-03-01 --title-contains "Welcome to New York" --tail-minutes 15
Step 2
node scripts/podcast-series/_merge-batch.cjs scripts/podcast-series/economist/checks-and-balance/batches/YYYY-MM-DD_batch.json --replace
Step 3
npm run podcast:economist:qa
Step 4
npx tsc --noEmit

Quality

A row is only right when it has receipts.

Model confidence is a triage signal. Verification means a human can trace the row back to the episode and recover why the data says what it says.

Review gates
  • Exact episode date and title match, especially when same-day bonus releases exist.
  • Transcript excerpt, timestamp or tail-window note, audio source, model, batch file, reviewer, and review date.
  • Named guest or substitute-host identity backed by episode-page lineup evidence and transcript speaker turns.
  • Question, answer, guesses, and correctness recoverable from the saved evidence.
  • Winner, tie, or stumper derived from `responses.correct`, not copied from model output.
  • Every catalog row marked as tracked quiz, verified no-quiz, needs extraction, or blocked source.
Replication recipe
  1. Pick the recurring segment and define a narrow schema before touching the model prompt.
  2. Build a durable source catalog with title, date, URL, source type, and refresh instructions.
  3. Transcribe a bounded audio window or full episode, then store the transcript receipt separately from the extracted row.
  4. Extract to JSON, merge through typed data, and keep raw batches for reruns.
  5. Run the QA audit in normal mode during iteration and strict mode before public claims or downstream data use.
  6. Promote only reviewed rows into verified status; leave uncertainty visible instead of filling gaps with guesses.

Next Layer

What would make this feel truly host-ready

The architecture is in a much better place now, but the biggest quality jump still comes from richer evidence and explicit episode-status tracking.

Add an episode-status registry: verified no-quiz, needs extraction, blocked source, or tracked quiz.

Store transcript evidence snippets and timestamps for every verified row.

Promote real people and appearances into first-class records instead of inferring them from response slots.

Keep QA and archive views separate so the public-facing pages can stay elegant without hiding uncertainty.