20.2 Data Leakage Prevention - AI-Powered Products

In 2023, a legal firm used an AI assistant to review contracts. They discovered that sensitive deal terms were being retained in the model's context window and occasionally appearing in responses to unrelated queries. This data leakage incident cost them a client and led to regulatory scrutiny.

Section Overview

Data leakage in AI systems takes multiple forms: training data extraction, context window contamination, sensitive data exposure in prompts, and output filtering failures. Each requires different defensive measures. This section covers the mechanisms of leakage and practical controls to prevent it.

Training Data Extraction

LLMs can memorize and later reproduce training data. Attackers have demonstrated the ability to extract sensitive information including phone numbers, email addresses, and personal identifiers from model outputs. This poses severe risks when models were trained on sensitive data.

The Extraction Threat

Research has shown that LLMs can reproduce training data with high fidelity when prompted correctly. An attacker who knows they are querying a model trained on specific data can craft prompts to extract that data.

Defense Against Extraction

Defenses against extraction include data minimization by not training on sensitive data unless necessary, differential privacy applying DP-SGD or similar techniques during training, memory sanitization using techniques to forget specific data points post-training, and output filtering to detect and block extraction attempts at inference time.

Context Window Leakage

The context window is the most immediate leakage vector. Any data placed in the context is visible to the model and can influence outputs. In multi-user systems, context from one user's session can leak into another's.

The Context Isolation Problem

In shared AI systems, context isolation is critical. A user's sensitive document should never influence responses to other users' queries, even indirectly through model behavior changes.

HealthMetrics: Context Isolation

HealthMetrics processes health records through their AI assistant. Each user session maintains isolated context. When a user queries about treatment options, the system loads only that user's relevant records, not records from other users. The context is purged after the session ends.

Context Management Best Practices

Context management best practices include session isolation where each user session has independent context, context expiration to automatically purge context after defined periods, data minimization in context to only include information necessary for the current query, and context partitioning to separate sensitive data into isolated context segments.

Sensitive Data in Prompts

Users often include sensitive information in their prompts without realizing it may be stored, logged, or used in unexpected ways. Your system must handle this data responsibly.

Data Classification for AI Systems

Data Classification Levels

Public data can be included in training and shared in outputs. Internal data should not appear in training and should be limited to context for specific queries. Confidential data should only be in isolated context, never in training, with strict access controls. Restricted data cannot be used in AI systems without explicit consent and additional controls.

Prompt Data Scanning

function scanPromptForSensitiveData(prompt):
    entities = extractEntities(prompt)

    for entity in entities:
        classification = lookupDataClassification(entity)

        if classification == "RESTRICTED":
            return { status: "BLOCK", reason: "Restricted data detected" }
        if classification == "CONFIDENTIAL":
            logForAudit(entity, "confidential_data_in_prompt")

    return { status: "ALLOW" }

Output Filtering

Even with careful input handling, models can generate outputs containing sensitive data. Output filtering validates model responses before they reach users.

PII Detection and Redaction

Implement real-time PII detection in outputs. Common sensitive data types include names, addresses, and phone numbers, Social Security numbers and passport numbers, financial account numbers, medical record numbers, and biometric identifiers.

Practical Tip

Use dedicated PII detection libraries rather than relying on regex patterns alone. Names and contexts vary widely, and regex approaches generate both false positives and false negatives.

DataForge: Output Filtering Pipeline

DataForge's enterprise search returns AI-generated summaries of documents. The output filtering pipeline scans each summary for PII before delivery. If detected, the PII is redacted and the query is logged for security review. The system also flags the user account for potential data handling training.

Prevention Controls Summary

Data Leakage Prevention Checklist

Classify data before using in AI workflows. Implement session isolation for multi-user systems. Scan prompts for sensitive data before processing. Filter outputs for PII and sensitive information. Purge context after defined expiration periods. Apply differential privacy if training on internal data. Log all data handling events for audit compliance. Implement data retention policies for AI interactions.