Businesses rely every day on various systems and pieces of equipment to keep their operations running smoothly. But all systems inevitably require upkeep. It could be intangible software, like an IT service network that has accumulated enough bugs to break an important feature, sending developers scrambling for a fix. Or it could be a piece of physical equipment, like an ice cream machine in a fast food restaurant with a broken o-ring.

Eventually, everything breaks down, from multi-site IT systems down to individual light bulbs. Unplanned downtime can have catastrophic consequences, and it’s up to facility maintenance engineers and technicians to plan ahead so that swift measures are taken to rectify a failure. The goal is to minimize downtime, reducing the costs associated with lost productivity, revenue or customer dissatisfaction.

Downtime can be minimized in many ways. For example, businesses can aim to reduce the amount of time it takes to repair a piece of equipment by having sufficient replacement parts accessible to technicians on-site. Or, they can observe repair processes to find faster ways to perform repairs or quicker ways to notify technicians. Even further, they can make investments in better-performing tools with longer lifespans to reduce the number of repairs needed.

But in order to understand how to improve the reliability of systems and components, we first must be able to measure their reliability. Mean time to repair (MTTR)—also known as mean time to recovery—and mean time between failures (MTBF) are two failure metrics commonly used to measure the reliability of systems or products within the field of facilities maintenance. While these acronyms are related, they have different meanings and are used to answer different questions.

First, let’s review MTBF. 

What is mean time between failures (MTBF)?

MTBF is a key performance indicator (KPI) that represents the average time between two consecutive failures of a system or product. MTBF is a measure of reliability, and it is commonly used in the context of warranties, maintenance planning and product development. Note that MTBF, which refers to repairable items, is not to be confused with the closely related term, mean time to failure, (MTTF), which refers to assets that are non-repairable and need to be replaced rather than repaired.

The MTBF calculation uses the following formula:

MTBF = Total operating time / Number of failures over a given period

So, for example, if a product is used for 1,000 hours and it fails 3 times during that period, the MTBF would be: 1000 hours / 3 failures = 333.3 hours

This means that on average, the product can be expected to fail after 333.3 hours of use.

MTBF is useful in determining the expected lifetime of a product and can help manufacturers plan for maintenance or replacement. However, it does not take into account how much time it takes to repair a product after it fails, which can be an important consideration in some applications. 

That’s where MTTR comes in. 

What is mean time to repair (MTTR)? 

MTTR is the average time it takes to repair a system or product after it has failed. MTTR is used to measure the reliability of a system or product from a repair standpoint. MTTR typically includes the time it takes to notify maintenance teams, allow equipment to cool down for repair, fix the issue, reassemble any relevant equipment or systems and test before restarting production. 

The goal of MTTR is to minimize the downtime caused by failures and reduce the costs associated with repairs. 

Here’s how to calculate MTBF:

MTTR = Total downtime / Total number of failures over a specific time

For example, if over the last year, a system failed 5 times, resulting in 10 total hours of downtime (including repair time), the MTTR would be: 10 hours / 5 repairs = 2 hours

This means that on average, it takes two hours to repair the system after a failure occurs.

MTTR is useful in determining the efficiency of maintenance operations and can help identify areas where improvements can be made. 

Differences between MTBF and MTTR

Mean time between failures (MTBF) and mean time to repair (MTTR) answer different questions and have different applications. MTBF and MTTR exist in a family of KPIs that include mean time to respond, mean time to detect (MTTD) and mean time to acknowledge (MTTA), among others.

MTBF is a measure of how long a system or product is expected to operate before it fails, and it is used to plan for maintenance or replacement. MTTR is a measure of how long it takes to repair a system or product after it fails, and it is used to minimize downtime and reduce repair costs.

MTBF does not take into account the period of time it takes to repair a product after it fails, while MTTR does not take into account the total time between failures. 

How MTBF and MTTR work together

Across many use cases, both metrics may be used in tandem to get a more complete picture of the overall maintainability of a system or product. For example, in a manufacturing plant, MTBF might be used to determine the expected lifetime of a machine and plan for replacement, while MTTR might be used to optimize maintenance schedules for that machine and maximize total uptime. In the context of software development, MTBF might be used to measure the stability of a system and plan for updates or bug fixes, while MTTR might be used to optimize the development process and reduce the time it takes to fix issues.

Manage assets to improve MTBF and MTTR

Improving MTBF and MTTR to reduce downtime can be a complex process that involves identifying and addressing the root causes of system failures, optimizing maintenance operations and implementing improvements in design and manufacturing processes.

Today, large organizations use Computerized Maintenance Management Systems (CMMSs) to help them manage their maintenance processes. A CMMS typically offers features like work order management, preventative maintenance scheduling, inventory management, asset management and reporting. 

IBM Maximo is enterprise asset management software that includes comprehensive CMMS capabilities. Maximo is a single, integrated cloud-based platform that uses artificial intelligence (AI), IoT and analytics to optimize performance, extend the lifecycle of assets and reduce the costs of outages. A related tool, IBM Instana Observability, offers full-stack observability, with the goal of helping users optimize and democratize incident prevention. 

Both of these products will give you the visibility into your assets and operations that you’ll need to make smarter, data-driven decisions, ultimately resulting in fewer breakdowns and less downtime.

Learn more about IBM Maximo Application Suite Get started with IBM Instana Observability

More from Automation

Why advanced API security is critical to outsmarting new threats

3 min read - As organizations continue to expand their digital footprint, new vulnerabilities are constantly emerging that can put them at risk. Among the most prominent new examples is attacks that exploit the growth of application programming interfaces (APIs), which work to connect applications and systems to facilitate the exchange of data. Along with the ease of interaction that APIs provide, however, comes potential new entry points for bad actors to gain access to organizations’ resources and data. According to Gartner®[1], “The explosive…

3 min read

Celebrating World Oceans Day: Revitalizing the marine ecosystem with technology-driven engineered reefs to accelerate CO2 capture

5 min read - Every year on June 8th, World Oceans Day provides a global platform to raise awareness about the value of our oceans and the critical need for their protection. One thing is for certain: oceans are vital to our existence. The importance of our oceans and coral reefs Oceans cover 70% of the Earth’s surface and is home to up to 80% of all life in the world. Oceans also generate 50% of the oxygen we need, absorb 25% of all…

5 min read

How Krista Software helped Zimperium speed development and reduce costs with IBM Watson

3 min read - Successful businesses are embracing the power of AI to help streamline operations, generate insights, boost productivity and drive more value for clients. However, for many enterprises, the barrier to entry for integrating trustworthy, scalable and transparent AI remains high. In fact, 80% of enterprise AI projects never make it out of the lab.   So how do businesses that want to incorporate AI move forward when there is such a high level of difficulty? Many have turned to IBM’s portfolio of…

3 min read

Enabling AI-powered business intelligence across the enterprise

3 min read - Data is the lifeblood of successful organizations. Beyond the traditional data roles—data engineers, analysts, architects—decision-makers across an organization need flexible, self-service access to data-driven insights accelerated by artificial intelligence (AI). From marketing to HR, finance to supply chain and more, decision-makers can use these insights to improve decision-making and productivity enterprise-wide.  But most businesses are behind. Essential data is not being captured or analyzed—an IDC report estimates that up to 68% of business data goes unleveraged—and estimates that only 15%…

3 min read