Hallucination rate

How often a model produces unsupported or incorrect facts relative to total responses.

When to use it

  • Building assistants that must be trustworthy (support, compliance, finance).
  • Comparing retrieval, grounding, or prompt changes.
  • Reporting risk posture to stakeholders.

PM decision impact

Hallucinations erode trust and drive costs (tickets, refunds). PMs must set acceptable thresholds by use case and measure after every change. Reducing hallucinations may increase refusals or latency; PMs balance safety with usability.

How to do it in 2026

Define what counts as hallucination per feature. Use grounded evals and spot checks. Add UI cues (citations, confidence) and fallbacks when confidence is low. In 2026, route high-risk intents to stricter prompts or smaller, more factual models to cut hallucinations without big cost jumps.

Example

A billing bot’s hallucination rate drops from 14% to 4% after adding grounding and a refusal pattern for unknown SKUs, reducing refund-related tickets by 19%.

Common mistakes

  • Using a single global threshold when risk varies by intent.
  • Not separating hallucinations from policy violations—different mitigations apply.
  • Measuring only offline; real users phrase things differently.

Related terms

Learn it in CraftUp

Last updated: February 2, 2026