When a board asks its CIO what AI governance looks like, the answer is usually a policy document, an ethics framework or a list of principles. It is rarely a procurement standard. That gap is where the real risk lives.
Deploying an AI system is always a critical decision. Yet that decision is made by a developer, sometimes by a vendor and rarely by someone accountable for the outcome. That decision is: What kind of AI are we using? In most enterprises today, the honest answer is that we often don’t know—and we lack the language to find out.
Selecting an AI architecture is not a technical preference. It is a risk-calibrated governance decision that determines auditability, accountability and consequences. That decision determines not just the system behavior, but also who is legally and operationally accountable when it fails.
That approach should change. Your use case’s risk should determine the form of AI you deploy, and your procurement language must reflect that.
Walk into any enterprise technology conversation today, and you will hear AI described as though it is a single category. Organizations talk about “deploying AI,” “governing AI,” “auditing AI”—as if the word names a uniform technology with consistent properties. In reality, it doesn’t.
The AI market currently contains at least three architecturally distinct types of system, each with radically different properties for governance, explainability and accountability:
• Probabilistic AI: Statistical pattern recognition engines (most large language models, generative AI, most modern ML). The same input can produce a different output across runs. Explainability is partial and model dependent. Causal attribution is difficult.
• Probabilistic AI with context orchestration (and controls): Probabilistic models operate within defined data sources and policy-routing logic. You can control what the model reasons from. You cannot guarantee identical outputs under identical conditions.
• Deterministic AI: Logic-governed, rule-based or ontology-driven systems. Every result can be verified against defined logic. Outputs are consistent and reproducible, with a full trace possible by design: what data, which rules, why that result.
In practice, most enterprise deployments combine these architectures—a probabilistic model inside a deterministic routing layer with a retrieval pipeline with policy guardrails. The governance obligation is to know which components operate under which properties, and to hold each component to the controls its risk level demands.
The governance implications of these three types are not equivalent. Yet most AI procurement treats them as if they are.
When a system can produce two different answers to the same question under the same conditions—and both can be acted on—it signals a problem. It is not operating at the level of rigor that high-stakes decisions require.
Using probabilistic AI itself is not a governance failure. However, it becomes a problem when organizations use it regardless of use case risk and then attempt to retrofit governance onto a system that was not designed to support it.
The right sequence is the reverse. Start with a clear-eyed assessment of the risk tier of the use case. Then specify—in your requirements, in your contracts and in your architecture review—what level of explainability, traceability and accountability that tier demands. Then procure accordingly.
The architecture column is not a suggestion. It is a requirement. A high-risk use case—one involving consequential decisions about people, high financial exposure or outcomes that are difficult to reverse—demands deterministic levels of explainability. Those levels can be achieved through deterministic design or through rigorously bounded hybrid controls.
Human oversight is the layer that makes AI governance real. But only when it is built to be real.
Consider what a human reviewer needs to provide meaningful oversight of an AI output. We hold colleagues to standards of trustworthiness—credibility, reliability, alignment of interests and integrity over time. The AI field has developed specific vocabulary for equivalent properties in systems. They need to be able to interrogate four things:
• Transparency is a question about the model: What data, what methodology and whether those methods were the right validated choices for this use case?
• Explainability is a question about the output: Not how the model generally works, but why it produced this specific decision for this specific person.
• Observability is a question about agent behavior over time: Is the system still performing correctly as conditions change, data shifts and it interacts with other systems?
• Robustness against adversaries is a question about cybersecurity: Did anyone tamper with the model after you last assessed it? Is it still operating within the boundaries of its original design and mandate?
Does your current AI architecture give a human reviewer access to any of that? If the system is probabilistic and the reviewer has no visibility into data lineage, no reasoning trace and no consistency guarantee—what exactly are they reviewing?
The answer, in most deployments, is the output. They are reviewing a number, a recommendation or a flag. They are not reviewing the reasoning. They are not reviewing the evidence. They are not, in any meaningful sense, in the loop.
You get more of the behaviors you measure. If your human reviewer is measured on throughput rather than quality of oversight, you have engineered a rubber stamp—and called it governance.
This practice is what liability laundering looks like. A human is placed in the process. Their sign-off is documented. The accountability box is checked. But the conditions for genuine oversight—the architectural transparency, the training, the authority to halt, the feedback mechanism—were not established.
The fix is building the architecture that makes true human oversight meaningful and choosing people with the domain expertise to understand how this AI is being used in its environment. Then measure those people on the right things.
Procurement and governance language is where most organizations fall short—not in their understanding of the technology. The organizations that get this right will be the ones that build the vocabulary—and the contractual teeth—to specify what they need from AI systems.
Four things need to change:
1. Require architecture disclosure in every AI procurement. Vendors should be required to specify whether their system is probabilistic, deterministic or a hybrid—and what the explainability properties of each component are.
They must also permit independent validation through audit, testing and continuous monitoring. “AI” is not a sufficient category for a procurement decision. “Probabilistic language model with retrieval-augmented generation” is.
2. Map use cases to risk tiers before selecting an architecture. Not every AI use case requires a deterministic architecture. A low-risk content summarization tool has different requirements than an AI system-making credit decisions or triaging patient care.
The risk assessment should happen before vendor selection, not after deployment. And the required architecture—with its explainability and accountability properties—should be written into the requirements, not left to the vendor’s discretion.
3. Operationalize principles as functional and non-functional requirements—not statements of intent. “We are committed to explainable AI” is a statement of intent. “At the Vigilant tier, the system must provide a full audit trail for each output, including source data with documented lineage and provenance, validated test or retest reliability and controls ensuring explanations are bound to the evidence record.”
This specification is a functional requirement. The difference is enforceable. One of these requirements belongs in a contract. The other belongs in a press release.
4. Establish outcome baselines before deployment. You cannot measure whether an AI system is causing harm unless you establish a baseline before deploying it. Without pre-deployment outcome data, you can observe outputs but cannot attribute change—you have no way to distinguish the system’s effect from what would have happened anyway.
Every AI deployment should require a documented baseline of the outcomes it is intended to influence, against which post-deployment performance is continuously measured. Where a baseline does not exist, building one is the prerequisite—not a subsequent task.
These standards are not aspirational. They are already law or binding guidance in several operating jurisdictions. The EU AI Act (Article 9) requires that high-risk AI systems have a documented risk management system—including human oversight measures and accuracy requirements—before deployment.
In the United States, federal AI hiring guidance was withdrawn in January 2025. The underlying obligations under Title VII and the ADA remain enforceable, and employer liability for discriminatory AI outcomes is actively being litigated in federal court. The Federal Reserve’s SR 11-7 model risk guidance imposes ongoing validation and governance obligations on AI models used in financial-services decisions.
Organizations operating across these jurisdictions are not choosing whether to govern AI rigorously. They are choosing whether to do so proactively or in response to an enforcement action.
AI governance that lives only in policy documents stops at aspiration. Real governance is built into the architecture that gets procured, deployed and measured.
Most organizations have an AI ethics policy. Far fewer have AI systems that can fulfill the accountability obligations they have assumed—to their customers, their regulators and the people most affected by these decisions.
Risk should inform architecture. Architecture should be written into the requirements. Requirements should have contractual teeth.
The organizations that will lead on AI governance in the next five years are not the ones with the most sophisticated ethics frameworks. They are the ones that have translated those frameworks into the language of procurement, architecture and measurement—and held their vendors to it.
That work starts with learning to ask the right question: not “do we have AI governance?” but “does our AI architecture support the accountability that we have promised?”