Why AI Governance Fails When It Starts with Models Instead of Data

Q: How can organizations prevent PII from entering public LLMs?

Organizations can prevent PII exposure by implementing Data-Centric Zero Trust. This includes using Data Security Posture Management (DSPM) to identify sensitive data in Microsoft 365 or Box, and applying automated DLP scanning to redact or block regulated fields before they reach an AI prompt.

Julie Taylor
Feb 2
4 min read

This article does not attempt to define AI governance. Instead, it examines why current approaches fail and what that reveals about where control must exist. Most discussions begin with models, policies, prompts, and explainability, but data remains in the background. When information flows freely into large language models (LLMs) and retrieval‑augmented generation (RAG) systems across M365, Box, iManage, and other SaaS platforms, risk accumulates before any “governance” touches it.

AI adoption is exploding, but the security controls that matter, classification, encryption, and usage policies enforced at the data layer, lag far behind. A recent survey of 461 security professionals found that 83% of organisations lack automated controls to prevent sensitive data from entering public AI tools, and 86% have no visibility into those data flows. Without enforceable data‑layer security, even the most sophisticated AI policy becomes advisory.

The Common Assumption (What Most Teams Believe)

Most AI governance programs assume that controlling the model controls the risk. They deploy policy libraries and ethical guidelines aligned with the OECD AI Principles, expecting them to stop exposure. Many organisations rely on employee training and accept‑only warnings as their primary safeguards. The theory is that if you can monitor prompts, restrict hallucinations, and audit model decisions, data leakage will decline.

This belief persists despite evidence to the contrary. Research shows that only 17% of organisations have implemented automated controls with data‑loss‑prevention (DLP) scanning. The vast majority rely on training or warnings without enforcement, leaving the door wide open for sensitive information to flow into AI systems. AI governance anchored at the model layer starts too late.

Why That Assumption Breaks Down

The assumption breaks down because data moves before governance applies:

Data flows precede controls: Employees paste social‑security numbers, intellectual property, and patient records into ChatGPT and other LLMs for convenience. These interactions occur across unstructured channels before any policy engine is triggered, often falling into the traps identified in the OWASP Top 10 for LLM Applications.
Controls act after exposure: Policies that surface as warnings or audits are reactive. One in five organisations issues warnings without monitoring or enforcement. By the time a model’s output is reviewed, sensitive data has already left the building.
Enforcement, not policy, is the vulnerability: Training alone cannot prevent individuals from uploading confidential documents. Enforcement must be automated at the data layer.
Velocity compounds risk: AI systems operate at machine speed. Real‑time policies require technical controls that classify and block sensitive fields before they reach the model.

What Actually Happens in the Real World

Consider a high-velocity environment, such as a Financial Services Institute (FSI). To meet rigorous response-time KPIs, staff may copy and paste sensitive client data, including Social Security numbers (SSNs) and account balances, directly into ChatGPT. While this improves individual productivity, it creates a massive "compliance debt": once data is ingested by a public LLM, it is effectively permanent.

Once ingested, data enters a "black box" where traditional data rights, the ability to track, retrieve, or delete, effectively vanish. This creates a permanent state of exposure that even regulated sectors are struggling to contain. The fundamental issue is a technical lag: while AI adoption is ubiquitous, the deployment of automated DLP scanning integrated into AI workflows remains the exception.

This lack of technical enforcement is mirrored at the leadership level. Despite the proliferation of AI policies, there is a distinct gap between having a "checklist" and a fully operational governance program. In many cases, corporate boards lack formal oversight, treating AI as a siloed IT project rather than a fiduciary risk. In this environment, the "crown jewels" of the enterprise travel into LLMs and RAG pipelines unchecked, accelerating continuous data leakage.

The Missing Control Layer: Data-Centric Zero Trust

The gap isn’t visibility - it’s the absence of enforceable controls at the data layer. Data‑centric Zero Trust requires that information be protected before it enters any model, a principle emphasized in the NIST AI Risk Management Framework (AI RMF 1.0).

Technologies such as Data Security Posture Management (DSPM) and selective encryption enforce “need‑to‑know” access controls on unstructured data. By automatically scanning for PII and intellectual property, they block unauthorised uploads and redact crown‑jewel fields.

This approach reverses the governance model: control follows the data.

Automated DLP engines intervene at the point of the prompt.
Selective encryption protects sensitive sections while preserving context.
Compliance Alignment: These controls provide the cryptographic evidence required by the EU AI Act and the ISO/IEC 42001 management standard.

In contrast, model‑level guardrails only catch problems after they happen.

Flip the Governance Model.

Moving from model-centric to data-centric security is the only way to scale AI without compromising your crown jewels. Explore how Confidencial embeds Zero Trust controls directly into your unstructured data workflows.

See the Confidencial Platform in Action

Frequently Asked Questions

Q: Why does traditional AI governance fail to stop data leaks?

A: Traditional AI governance fails because it focuses on the model layer (prompts and filters) rather than the data layer. When organizations lack automated data controls, sensitive information is often ingested into LLMs or RAG pipelines before any policy engine can intervene.

Q: What is the difference between model-centric and data-centric AI security?

A: Model-centric security focuses on monitoring user prompts and AI outputs. Data-centric security protects the information itself through classification, DSPM, and selective encryption, ensuring that security controls travel with the data regardless of which AI tool or model processes it.

Q: How can organizations prevent PII from entering public LLMs?

A: Organizations can prevent PII exposure by implementing Data-Centric Zero Trust. This includes using Data Security Posture Management (DSPM) to identify sensitive data in Microsoft 365 or Box, and applying automated DLP scanning to redact or block regulated fields before they reach an AI prompt.