Prompt injection

A user or document attempt to override system instructions, causing the model to act outside intended bounds.

When to use it

Exposing user-generated or external content to the model (RAG, emails, web).
Allowing tool use or actions based on model decisions.
Handling sensitive data or compliance requirements.

PM decision impact

Prompt injection threatens data safety and product integrity. PMs own mitigations: isolation, input cleaning, trust scoring, and confirmations. Costs include extra latency and possible refusals; benefits include preventing breaches and bad actions.

How to do it in 2026

Separate untrusted content from instructions; use markers and quoted blocks. Run input classifiers for jailbreak patterns. Add allowlists for tools and confirm high-impact actions with users. In 2026, use content-origin metadata in prompts so the model treats untrusted text as data, not instructions.

Example

A RAG chatbot started following hidden instructions in PDFs. After sandboxing retrieved text with provenance tags and adding an injection detector, malicious success dropped to <0.3% with negligible latency change.

Common mistakes

Mixing user content directly into system prompts without delimiting.
Allowing tool execution without confirming user intent.
Assuming provider safety settings alone stop injections.

Learn it in CraftUp

AI ethics & safety for PMs (blog)Start learning free in the CraftUp app

Last updated: February 2, 2026