The maximum response time you can spend across model calls, tools, and orchestration while meeting UX and business goals.
Latency shapes adoption and satisfaction. PMs allocate budget per step, decide where to cache, and when to stream. They trade quality against speed and cost. Clear budgets prevent scope creep and keep the experience snappy.
Set p95 targets per surface (e.g., 1.5 s chat, 4 s long-form). Break down where time is spent; cache deterministic steps. Stream partial results when helpful. In 2026, maintain a latency scoreboard by feature and block releases that exceed budgets unless justified.
A triage flow had p95 2.8 s. By precomputing embeddings nightly and capping tool retries, p95 fell to 1.6 s with no quality loss, lifting CSAT by 0.2 points.