Retrieval-Augmented Generation (RAG)

Pairing LLM reasoning with external retrieval so responses cite up-to-date, relevant sources instead of relying on model memory.

When to use it

  • Content changes frequently (policies, catalog, release notes).
  • You need traceable answers with citations for compliance or trust.
  • Model hallucinations are high and fine-tuning is too slow or costly.

PM decision impact

RAG is the main lever for freshness and factuality. PMs decide what corpus to expose, how to chunk and rank it, and how much to trust the retrieved facts. Done well, RAG reduces support load and legal risk; done poorly, it slows responses and confuses users.

How to do it in 2026

Curate a clean, permissioned corpus. Choose chunking tuned to your queries, add re-ranking, and cap citations to the most relevant 3–5. Reserve prompt space for safety and instructions. In 2026, ship per-collection evals (answer correctness, citation precision) and degrade gracefully to model-only answers when retrieval fails.

Example

A release-notes assistant uses RAG over changelog docs and support macros. After adding re-ranking and limiting to three citations, correct-answer rate jumps from 71% to 88% while latency stays under 1.3 s at p95.

Common mistakes

  • Indexing everything without access controls, leaking internal data.
  • Returning too many citations, overwhelming users and increasing latency.
  • Ignoring retrieval failure paths, leading to empty or hallucinated answers.

Related terms

Learn it in CraftUp

Last updated: February 2, 2026