March 2026

The Timestamp Problem

When someone reviews an AI output before it goes out, what are they actually checking? I’ve been asking that near the end of these meetings, after the frameworks and the policies. Someone stares at the mug in front of them. Someone else sees an urgent Teams message pop up on the screen, types two sentences, looks up.

The senior person says something about training programmes, or the review workflow, or the audit trail that captures every approval. Nobody answers. I’ve been in that room enough times now to know what the silence means.

It means: we don’t know. And more than that: we have built an entire governance architecture on top of that not-knowing, named it oversight, and moved on.

Here’s what the governance architecture actually captures. A timestamp. A name. A record that at 11.47 on a Tuesday, a human being was present in the loop. The timestamp is real, the loop is documented, the audit trail is complete. What it doesn’t capture is the only thing that matters: what happened in the seconds between the AI output appearing on the screen and the human clicking approve.

What often happens in those seconds is this. The output looks plausible. The tone is right. The first sentence reads well. There’s a deadline, there are twelve other things open on the screen, and the AI was probably (!) right the last forty times. I know this because I’ve scanned and clicked too. The timestamp can now record: reviewed.

This is not a failure of character, and it isn’t laziness exactly. It’s what humans do when they’re embedded in systems that have taught them, through repeated experience, that the system is reliable. The more accurate the AI, the less the human checks it. This has been studied, named, replicated. Automation complacency: as system reliability increases, human vigilance decreases. The better the AI gets, the less people check it. The loop tightens exactly where you need it to loosen.

Training doesn’t fix this. The behaviour isn’t produced by a knowledge gap. It’s produced by the rational allocation of limited attention in a high-volume environment. You can run the training programme, add the checkbox, build the review workflow, and the person will complete the training, tick the box, follow the workflow, and still scan and click, because none of those interventions changed the underlying incentive structure, the cognitive load, or the fact that the AI was probably right the last forty times.

There’s a compounding problem that makes this worse. When approval chains have multiple layers, each layer assumes the previous one did the work. The second reviewer looks at the first reviewer’s approval, not the original output. The committee reviews the fact that two humans have already signed off. At no point does anyone return to the AI output itself. Accountability doesn’t accumulate up the chain. It dissolves. By the time it reaches the top, the timestamp is the only thing anyone is actually looking at.

The governance conversation about AI is almost entirely top-down: what the model should be permitted to do, what the policy should require, what the audit trail should capture. These are real questions and they matter. But they all assume that the human in the loop is doing something when they’re in the loop. They assume that review means review. And that assumption is sitting, unexamined, at the foundation of most of the AI governance work I see organisations doing right now.

When the silence in those rooms finally breaks, it usually goes to one of two places. The first is training: we need to teach people what good review looks like. The second is technology: better tooling, better interfaces, better flags that force attention. Both are reasonable. Neither touches the structural problem: we designed the workflow around the speed of the click, not the speed of the thought. Those are not the same speed. The honest version of the answer, the one nobody says out loud, is something like: at the scale we’re operating, genuine human review of every AI output isn’t possible. We know that. So what we’ve built is a process that looks like oversight, records like oversight, satisfies audit requirements like oversight, and produces the timestamp that proves oversight happened. What we haven’t done is ask what we actually need humans to be doing in that loop, and whether the loop as currently designed lets them do it.

The timestamp is not the problem. The timestamp is just the evidence. The problem is that we needed someone accountable, so we put a human in the loop, and didn’t ask whether the loop gave that human what they needed to actually be accountable: the time, the context, the decision rule, the definition of what they were supposed to be checking and why. We asked them to be present. We built a system that records their presence. And then we called it governance.

What I haven’t heard yet, in any of those rooms, is someone say: we know what good review actually looks like at this scale, and we’ve designed the workflow around that. I haven’t said it either. The mug is still there. The silence still comes back. That’s the question the timestamp can’t answer.

If this resonates, the diagnostic sits on the Frameworks page.