How often model-invoked tools succeed, how they fail, and how gracefully the system recovers.
Reliability governs trust and cost. PMs set SLOs (success rate, p95 latency), decide retries and fallbacks, and influence whether humans must review. Better reliability lowers support cost and improves task completion. Poor reliability can negate model quality gains.
Instrument every tool call with status, latency, and arguments. Add retry rules per failure type, circuit breakers, and safe fallbacks. In 2026, track success by model/tool pair and auto-route to the most reliable path; expose incident alerts to product owners.
By capping retries to two, adding circuit breakers, and improving validation, a sales ops agent lifted tool success to 96% and cut cost per task 19% while keeping completion time within SLA.