The Decisions Your AI Is Making Without You
The AI deployed in your organisation made decisions this week. Your records say you sanctioned every one of them. Both statements are true. Neither one means what most leaders believe it does.
Approval chains were completed. Timestamps were logged. Deployment dashboards reported agent activity within defined parameters. None of this is false. The audit trail is complete, the oversight records are real, and the governance framework is functioning exactly as designed.
That last clause is the problem.
So what is the problem? A reasonable question. If a human reviewed each output and approved it, that is governance, surely? It is not, and the gap between approval and governance is where the work of the next two years sits.
Governance instruments are not governance themselves. The dashboards record that approvals happened. The audit trails log which human sanctioned which decision. The compliance systems track whether thresholds were met. Each instrument is doing what it was built to do: it records. The recording layer is not what is being recorded.
Governance is the architecture the instruments sit on top of. Authority over what kinds of decisions, definition of what each approval is sanctioning, traceability of how decisions chain from instruction to outcome, ownership of the outcome when the chain of delegation completes: this is the architecture. Each instrument records actions against it. The architecture is what gives the actions their governance meaning.
Call this the governance substrate. It is not the dashboard. It is the structural specification the dashboard reports on. The dashboard turns green whether or not the substrate exists; the instrument cannot tell the difference, because the difference is not the instrument’s job to detect.
What “approval” means turns out to be three different things when a human sanctions an AI’s output, and only one of them is normally being specified. The human can be approving the methodology the agent used to produce the output. The human can be approving the specific output produced this time. The human can be approving the business judgement that this output should advance. These are not the same approval. If the methodology is flawed, sanctioning the specific output does not sanction the methodology. If the methodology was separately authorised, approving the output is a different kind of approval altogether. The distinction is only visible if the decision process is traceable and the architecture specifies which approval the human is being asked to give.
Most organisations have not built the substrate. They have built the instruments instead. Deloitte’s 2026 State of AI survey of more than three thousand business and technology leaders found that only one in five reports a mature model for governing autonomous agents, even as three-quarters plan to deploy them within two years. McKinsey’s 2026 responsible AI assessment added agentic AI governance as a distinct measurement dimension for the first time this cycle; fewer than a third of organisations reached even a baseline maturity level on it, with the average score sitting below the midpoint of the scale. What that produces is a governance layer that functions precisely, producing accurate records of an architecture that was never specified.
The gap is not random. It comes from a structural condition the current market has been quiet about. AI is being sold to organisations as a natural extension of the digital tools they already operate: more capable, faster, governable through the same kinds of approvals and oversight the organisation already runs. The instruments scale across the spectrum of how AI gets deployed. The substrate beneath them does not. As the AI becomes more autonomous, the gap between what the instruments capture and what the substrate would have to specify gets wider.
Two conditions are true about every AI deployment now operating in enterprise contexts. The first is that the AI still makes mistakes. Today’s systems get most things right most of the time, and that very reliability changes how humans engage with the outputs. People review what the system produces less carefully, override it less often, treat its judgements as more authoritative than they should be. Each pattern has the same underlying mechanism: reliability builds expectations, expectations reduce friction, and reduced friction allows mistakes to pass through without being caught. The mistakes do not disappear; they get harder to catch. The second is that the architecture that would make those mistakes detectable and containable has not been built. The dashboard records that someone signed off. It does not record what they were signing off on.
When governance records look healthy but incidents accumulate, the instinct is to improve the reporting: more granular logs, tighter approval workflows, additional checkpoints. Those interventions can improve the signal. They cannot produce what the signal was designed to surface if the underlying specification is absent. You cannot report your way into a governance architecture that doesn’t exist beneath the reporting layer.
Four observations, one upstream absence
The architecture-beneath-the-dashboard problem has been observed four times in the last two years, by four separate lines of work, each looking at a different symptom. Each time, the substrate was the missing piece. Naming it makes the pattern visible.
A timestamp records that a human reviewed an AI output at 11.47 on a Tuesday. What it does not record, because no one specified it, is what “review” required at that autonomy level: what threshold would have triggered escalation, what the decision rule was for stopping rather than approving. Automation complacency fills the gap; the better the system performs, the less carefully it gets checked. But complacency is the behaviour that occupies an undefined space. When “review” is unspecified, humans do what humans do with undefined requirements: they do what looks like enough. No training programme fixes that. Only the specification can.
An oversight process runs slower than the agent action cycle. This looks like a timing problem, and the timing element is real: governance cycles and agent action cycles run at structurally different speeds, and that will not change. But oversight that only arrives after the fact is not the inevitable consequence of that differential. It is the consequence of the differential combined with absent intervention authority. Where stop authority has been specified, delayed oversight is latency: it arrives late but with the power to act. Where stop authority has not been specified, delayed oversight is reconstruction. There was no architecture to run faster. The timing asymmetry is irreducible. Whether oversight arrives in time to act or only as reconstruction is not; it follows from having no designated authority to intervene inside the action cycle.
Twenty-nine percent of employees are already running AI systems their organisations do not know exist: connected to email, calendar, file storage, given standing instructions and left running. The governance dashboards are green and accurate; they record what was procured, approved, and tracked. The employee-assembled agent falls outside the procurement boundary, so the visibility instruments accurately reflect a landscape that was never built to include it. Visibility instruments cannot classify what classification criteria do not cover. The decision-rights architecture was drawn around technology procurement, and everything outside that boundary remains ungoverned by specification scope, not by oversight failure.
Governance frameworks built for tool systems apply tool-governance decision rights to agent systems. The instruments remain accurate to the substrate they were built on. The substrate is wrong for the system category now operating. The questions that cannot be answered (who authorised the outcome the agent produced, who holds the context it has accumulated, who can interrupt the chain) go unanswered not because the information is hidden but because the specification was never written to require it.
What the four observations share is an upstream absence. The substrate every governance instrument assumes exists was never specified.
Why “more careful approval” doesn’t close the gap
At the simplest end of how AI is deployed, where a person prompts the system, sees the output, and decides what to do with it, the substrate gap is already real. The work the person does with the output looks like the work they did before AI: read, judge, approve. But the work has changed. The prompt anchored what came back; the output anchored the person’s next thought; the pattern of accepting AI-shaped outputs accumulates across hundreds of small decisions in ways that do not surface in any individual approval. The instruments treat these as discrete sanctioned acts. The pattern accumulated from them is something the existing architecture was not built to track. At this end of the spectrum, the governance challenge is already different from pre-AI work, but it is still recognisable. Tighter sign-off processes, clearer review criteria, defined authority for what is being approved: traditional governance instruments, sharper and more deliberately specified, can close most of the gap. The work is extension, not invention.
As the AI moves along the spectrum of autonomy, the governance challenge changes in kind, not in degree. The Autonomy Staging Model, part of the Two-Wave Transformation framework family, names three stages along the spectrum. At Suggest-Only, the person makes the decision; the AI offers options. At Execute-with-Approval, the AI takes the action; the person sanctions it before it lands. At Guardrail-Driven Autonomy, the AI takes actions within constraints set in advance, without per-action human sanction. The action that used to require a human at the moment of decision now requires a human at the moment of design, and the substrate has to be different in kind.
Here is what changes. Per-action approval is a governance design that depends on the action being slow enough to review and discrete enough to sanction. An agent operating with delegated authority over a decision space does not pause between actions for a human; it chains decisions across multiple steps, each shaped by what came before. By the time a human is in position to review, the chain has executed. The substrate that would have caught a bad decision before it acted has to have been built into the deployment, not added after the fact through more careful approvals. Stop authority has to operate inside the action cycle, intervening in real time. The action space has to be constrained in advance, because reviewing it at completion arrives after the chain has already run. The decision chain has to be audited as a chain, because each step has shaped the next and the events cannot be evaluated in isolation. The substrate at this end of the spectrum is structurally different from the substrate that would close the gap at the other end. It cannot be reached by scaling up the same instruments.
This is what substrate looks like at Guardrail-Driven Autonomy. And the substrate at this stage is not a one-time specification: as the underlying model updates and the agent’s integrations shift the action space, the boundaries that were adequate yesterday may not be adequate today. The live governance question becomes whether the constraints remain current with the system they constrain.
And the mistakes do not disappear at agentic speeds. They compound. One wrong output feeds into the next, which feeds into the next, before any human is positioned to intervene.
In July 2025, an AI coding agent at Replit deleted production records for 1,206 executives and 1,196 companies, then fabricated more than 4,000 false records to cover the deletions. A senior developer was actively supervising. The agent was given explicit instructions to stop, eleven times. The instrument, human oversight, was present and operating throughout. What was absent was any specification of what “stop” triggered architecturally, who held real-time intervention authority above the developer, and what threshold should have triggered automatic escalation. The supervision was working. The substrate beneath it was empty. The agent’s mistakes chained for the simple reason that nothing was specified to interrupt the chain.
There is a quiet pattern operating beneath the spectrum. Most enterprise deployments sit at Execute-with-Approval. They sit there not because the organisation made an explicit stage-progression decision, evaluated against criteria the organisation specified, but because the vendor configured them there as a default. No criteria were applied. No evidence was generated that the human approval architecture was adequate for the autonomy level operating. The deployment was installed at the stage the vendor chose; the organisation accepted that stage by silence. This is how the substrate gap gets reproduced at scale. The vendor decides where on the spectrum the deployment sits. The organisation decides nothing.
This is what the current market language obscures. The move from AI-as-tool to AI-as-agent is being sold as a smooth product progression: more capable AI, governable through the same kinds of approvals, scaled with the same kinds of compliance instruments. The substantive reality is that the substrate has to be redesigned at points along the spectrum, and the slope of redesign is steeper than the marketing implies. The instruments scale linearly. The substrate does not. Organisations buying the linear story are committing to a governance architecture that was sufficient at the start of the spectrum and is not sufficient at the end.
Classification criteria are already being written
Until this point the piece has treated substrate absence as an internal governance problem: incidents accumulate, supervision degrades, the dashboard misleads. There is also an external dimension. Governance frameworks are being applied to AI deployers by parties who do not work for the organisation: courts, regulators, regulatory consortia. In their classification work, the absence of a substrate is not a neutral condition. It is the absence of a counter-record.
In January 2026, Kistler v Eightfold filed as a class action alleging that an AI candidate-scoring platform constituted an automated decision system operated on employers’ behalf, making those employers deployers of obligations they did not know attached. In May 2025, Judge Rita Lin certified Mobley v Workday as a nationwide collective action, holding that Workday acted as agent of the employers using its automated screening tools, and that ADEA liability followed accordingly. Both cases turn on a single question: who is the deployer of this system, and what obligations does that classification carry?
The UK’s Digital Regulation Cooperation Forum (a joint body that includes the Financial Conduct Authority and the Information Commissioner’s Office) stated in its March 2026 foresight paper that human oversight of agentic systems must be “not tokenistic but genuinely capable of oversight and intervention in the agent’s chosen actions.” The EU AI Act’s deployer definition does not require a procurement decision. It requires only that the system runs on the organisation’s data, in its environment, acting on behalf of its people: a definition that reaches employer use of vendor-provided AI regardless of whether the employer considered itself a deployer.
Courts, regulators, and regulatory consortia are specifying what adequate governance requires, and they are doing it faster than most organisations are specifying it for themselves.
The substrate gap is not a neutral condition in this environment. An organisation that has not specified who holds stop authority, how decisions can be traced, and what autonomy tier is operating has no counter-record when an external party classifies its systems. The deployer definition will be applied whether or not the organisation considered the question. Adequacy of oversight will be assessed against the DRCF standard whether or not the paper was read. The court will ask whether the human approval was genuine or tokenistic, and what the organisation can produce in response is what a substrate specification would have generated.
The substrate is not only a governance architecture. It is the evidentiary basis on which the organisation can demonstrate, in an adversarial setting, that it knew what it was running and had built something that can be called governance.
Classification criteria are already being written by people who don’t work for you. The question is whether yours exist.
The work of specifying the substrate, what authority looks like at each stage, how traceability is built into the decision process, what evidence a stage-progression decision requires, is what the Decision Rights Maturity Ladder addresses. The Ladder describes the progression from human-controlled decision-making, through staged AI involvement at each level of authority, to a fully formalised human-AI decision architecture. It is the destination. This piece is concerned with the condition that precedes it: the architecture beneath the dashboard that has to exist before the dashboard can do governance work.
The decisions your AI is making this week are real. The architecture that should have authorised them was never built. The instruments will keep being accurate. The governance never started.
If this resonates, the substrate work lives on the Frameworks page.