Prompt injection

A user or document attempt to override system instructions, causing the model to act outside intended bounds.

When to use it

  • Exposing user-generated or external content to the model (RAG, emails, web).
  • Allowing tool use or actions based on model decisions.
  • Handling sensitive data or compliance requirements.

PM decision impact

Prompt injection threatens data safety and product integrity. PMs own mitigations: isolation, input cleaning, trust scoring, and confirmations. Costs include extra latency and possible refusals; benefits include preventing breaches and bad actions.

How to do it in 2026

Separate untrusted content from instructions; use markers and quoted blocks. Run input classifiers for jailbreak patterns. Add allowlists for tools and confirm high-impact actions with users. In 2026, use content-origin metadata in prompts so the model treats untrusted text as data, not instructions.

Example

A RAG chatbot started following hidden instructions in PDFs. After sandboxing retrieved text with provenance tags and adding an injection detector, malicious success dropped to <0.3% with negligible latency change.

Common mistakes

  • Mixing user content directly into system prompts without delimiting.
  • Allowing tool execution without confirming user intent.
  • Assuming provider safety settings alone stop injections.

Related terms

Learn it in CraftUp

Last updated: February 2, 2026