Data leakage

Sensitive information being exposed to unauthorized users or external systems through model inputs, outputs, or logs.

When to use it

  • Handling PII, financial, or confidential customer data.
  • Indexing customer documents for RAG.
  • Allowing tool actions that touch private systems.

PM decision impact

Leakage risk affects legal exposure and sales. PMs must enforce tenancy, masking, and logging. Some protections add latency or reduce recall (when content is masked), so tradeoffs need explicit decisions.

How to do it in 2026

Mask or tokenize sensitive fields before sending to models; enforce tenant isolation in retrieval; redact outputs and logs. In 2026, add real-time detectors for PII in prompts and responses and auto-quarantine suspicious sessions for review.

Example

After masking account numbers and enforcing per-tenant namespaces in the vector DB, a fintech copilot saw zero cross-tenant leaks in quarterly tests while maintaining 1.2 s p95 latency.

Common mistakes

  • Logging raw prompts/responses with PII.
  • Sharing indexes across tenants without strict filters.
  • Letting agents copy private content into public channels.

Related terms

Learn it in CraftUp

Last updated: February 2, 2026