How often a model produces unsupported or incorrect facts relative to total responses.
Hallucinations erode trust and drive costs (tickets, refunds). PMs must set acceptable thresholds by use case and measure after every change. Reducing hallucinations may increase refusals or latency; PMs balance safety with usability.
Define what counts as hallucination per feature. Use grounded evals and spot checks. Add UI cues (citations, confidence) and fallbacks when confidence is low. In 2026, route high-risk intents to stricter prompts or smaller, more factual models to cut hallucinations without big cost jumps.
A billing bot’s hallucination rate drops from 14% to 4% after adding grounding and a refusal pattern for unknown SKUs, reducing refund-related tickets by 19%.