Contemplating reliability, why less can be more.
Intuitively, we want to create good applications whose reliability is pushed to the maximum. What we tend to overlook is the hidden drawbacks of aiming for an overly high reliability, like mental stress during an outage and hesitance towards rapid feature development. Accepting that failures may occur takes off stress from DevOps teams and allows them to focus on what really matters for the end-users.
In this talk, we’ll discuss an outline for a framework on how many failures DevOps teams should reasonably expect using error budgets. While technical errors may abound, user impact may be negligible, so keeping an eye on end-user impact is of the essence. This will show us how severe an incident really is and help us stay sane during “unimportant” incidents. Moreover, when reliability is surpassing expectations, we will learn that we can allow for more “risky business” and innovate fastly, improving on customer experience in another way.