Home / Technology / AI's 'March of Nines': Engineering for Dependability
AI's 'March of Nines': Engineering for Dependability
8 Mar
Summary
- Reaching high AI reliability requires significant engineering effort beyond initial demos.
- Agentic workflows compound failures; each step's success probability is critical.
- Reliability is achieved by defining measurable SLOs and implementing nine key engineering controls.

The "March of Nines" highlights the substantial engineering effort required to achieve high reliability in AI systems, extending far beyond initial successful demonstrations. For enterprise applications, the gap between a functional demo and dependable software performance is critical for adoption.
Agentic workflows, common in AI, compound failures. If a workflow has multiple steps, each with a specific success probability, the overall success rate decreases exponentially. Correlated outages and failures in shared dependencies can dominate unless specifically addressed.
To improve AI reliability, teams must define measurable Service Level Objectives (SLOs) and invest in controls that minimize variance. Key performance indicators (SLIs) should track workflow completion, tool-call success, schema validation, policy compliance, latency, cost, and fallback rates.
Nine key strategies enhance AI dependability. These include constraining autonomy with explicit workflow graphs, enforcing contracts at all system boundaries, and layering validators for syntax, semantics, and business rules. Routing decisions should be risk-based, and tool calls engineered like distributed systems with robust error handling.
Predictable retrieval and a production evaluation pipeline are also crucial for identifying rare failures and preventing regressions. Investment in observability and operational response, including detailed tracing and runbooks, accelerates diagnosis and remediation. Finally, implementing an "autonomy slider" with deterministic fallbacks allows for safe scaling of AI capabilities.
Enterprise insistence on these later 'nines' stems from the direct business risks associated with AI inaccuracy, including financial losses and reputational damage. Therefore, disciplined engineering practices are essential for closing reliability gaps and building trustworthy AI.



