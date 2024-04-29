Businesses rely every day on various systems and pieces of equipment to keep their operations running smoothly. But all systems inevitably require upkeep. It could be intangible software, like an IT service network that has accumulated enough bugs to break an important feature, sending developers scrambling for a fix. Or it could be a piece of physical equipment, like an ice cream machine in a fast-food restaurant with a broken o-ring.

Eventually, everything breaks down, from multisite IT systems down to individual light bulbs. Unplanned downtime can have catastrophic consequences, and it’s up to facility maintenance engineers and technicians to plan ahead so that swift measures are taken to rectify a failure. The goal is to minimize downtime, reducing the costs associated with lost productivity, revenue or customer dissatisfaction.

Downtime can be minimized in many ways. For example, businesses can aim to reduce the amount of time it takes to repair a piece of equipment by having sufficient replacement parts accessible to technicians onsite. Or, they can observe repair processes to find faster ways to perform repairs or quicker ways to notify technicians. Even further, they can make investments in better-performing tools with longer lifespans to reduce the number of repairs needed.

But in order to understand how to improve the reliability of systems and components, we first must be able to measure their reliability. Mean time to repair (MTTR)—also known as mean time to recovery—and mean time between failures (MTBF) are two failure metrics commonly used to measure the reliability of systems or products within the field of facilities maintenance. While these acronyms are related, they have different meanings and are used to answer different questions.

First, let’s review MTBF.