Tool reliability

How often model-invoked tools succeed, how they fail, and how gracefully the system recovers.

When to use it

  • Tool calls are timing out, erroring, or returning partial data.
  • You see high cost or latency from repeated retries.
  • Critical business actions depend on tool calls (payments, provisioning).

PM decision impact

Reliability governs trust and cost. PMs set SLOs (success rate, p95 latency), decide retries and fallbacks, and influence whether humans must review. Better reliability lowers support cost and improves task completion. Poor reliability can negate model quality gains.

How to do it in 2026

Instrument every tool call with status, latency, and arguments. Add retry rules per failure type, circuit breakers, and safe fallbacks. In 2026, track success by model/tool pair and auto-route to the most reliable path; expose incident alerts to product owners.

Example

By capping retries to two, adding circuit breakers, and improving validation, a sales ops agent lifted tool success to 96% and cut cost per task 19% while keeping completion time within SLA.

Common mistakes

  • Treating all failures the same, instead of handling auth, validation, and timeouts differently.
  • Retrying blindly, inflating bills and frustrating users.
  • Not surfacing failures to users when action results are uncertain.

Related terms

Learn it in CraftUp

Last updated: February 2, 2026