What is fault tree analysis (FTA)?

Group of business people entering a building

What is FTA?

Fault tree analysis (FTA) offers one approach to root cause analysis, identifying and analyzing the root of asset issues before equipment breaks down. FTA helps in manufacturing facilities, where understanding the potential causes of system failures is crucial to preventing them.

Fault tree analysis is a deductive, top-down approach to determining the cause of a specific undesired event within a complex system. It involves breaking down the root cause of a failure into its contributing factors and representing it through a graphical model called a fault tree, which helps managers and engineers identify potential failure modes and the probability of each failure mode, for safety and reliability analyses.

First developed in the early 1960s by Bell Laboratories to help the US Air Force understand potential flaws in the Minuteman missile system, FTA has been widely used across various industries, including the aerospace, nuclear power, chemical and automotive sectors, among others.

Maintenance managers might use fault tree analysis to:

Design and/or install a new system.
Make changes to existing systems.
Investigate system safety or system reliability.
Assess regulatory compliance.
Optimize maintenance budgets.

As manufacturing environments continue to evolve and become more complex, the need for effective risk management tools like FTA becomes increasingly important. Incorporating fault tree analyses into your organization's safety analyses and reliability engineering practices can help your organization gain deeper insights into potential causes of system failure. FTA can also help improve overall performance and reduce the likelihood of costly and potentially catastrophic incidents.

Join over 100,000 subscribers who read the latest news in tech

Stay up to date on the most important—and intriguing—industry trends on AI, automation, data and beyond with the Think Newsletter, delivered twice weekly. See the IBM Privacy Statement.

Performing a fault tree analysis

Performing a fault tree analysis is a complex process that involves seven key steps.

Step 1: Define the undesired event

Before running your analysis, you should clearly define the undesired event you want to analyze. This event should be specific and measurable, like a component failure or a system malfunction. It’s also important to define the event in clear, consistent terms, since it serves as the starting point for your fault tree diagram.

Step 2: Identify the contributing events and factors

Once you define the undesired event, you should start to identify the factors and events that might contribute to its occurrence. Contributing factors tend to fall into two broad categories: basic events and intermediate events.

Basic events, those events that cannot be further broken down into simpler events are the most fundamental events in a fault tree, representing the lowest level of events you can analyze. A basic event in a fault tree for a car accident, for example, might be "the driver loses control of the vehicle".

Intermediate events are located between the lower-level basic events and the top event (the primary undesired event being analyzed). Intermediate events are caused by other events in the fault tree and, in turn, cause other events. They represent higher-level events that can be analyzed further. Using the same car accident as an example, an intermediate event in the fault tree might be "tire blows out".

Be sure to consider both internal and external events, like component failures, human error and environmental conditions. You might need to consult with subject matter experts, and/or review of historical data, incident reports and maintenance records, at this stage of the analysis.

Step 3: Construct the fault tree

Using standard gate symbols and event symbols, construct a graphical representation of the relationships between the undesired (or output) event and its contributing factors (also called input events). The fault tree should be organized hierarchically, with the undesired event at the top and the contributing factors branching out below it.

Laying out basic events is straightforward, since basic events cannot produce other events. However, including intermediate events is a bit more complex, as intermediate events require Boolean logic gates that indicate the relationships between top-level, intermediate and basic events.

There are two main types of logic gates used in fault trees: AND gates and OR gates.

AND gates: Use AND gates when all contributing events must occur simultaneously for the undesired event to occur. For example, if a system failure requires both a component failure and an operator error, an AND gate is used to connect the events in the fault tree.
OR gates: Use an OR gate when any one of the input events is sufficient to cause the output event. In other words, the output event happens if at least one of the input events connected to the OR gate happens. If, for instance, a system failure might result from either a component failure or an operator error, an OR gate would be used to connect the events.

Though less commonly used, NOT gates, XOR gates, K/N gates and INHIBIT gates can also help identify specific relationships between input and output events.

NOT gates: NOT gates represent the inverse of an input event. If the input event does not occur, the output event will occur. These gates are less common in fault tree analysis, since they model the absence of an event or the occurrence of a complementary event.
XOR gates (Exclusive OR gates): Use an XOR gate when exactly one of the input events must occur for the output event to happen. If none or more than one of the input events occur, the output event will not happen.
K/N gates: K/N gates, also known as voting gates or threshold gates, are used when a specific number of the input events (K) out of all the possible input events (N) must occur for the output event to happen. K/N gates can help you illustrate more complex relationships in a fault tree analysis.
INHIBIT gates: Like an AND gate, an INHIBIT gate indicates that an output event will occur if both input events and a conditional event (a condition or restriction that can apply to any gate) occurs.

Intermediate events can also include undeveloped events, which are events that aren’t fully understood or haven’t been fully analyzed.

Using the various available gates will help you create a comprehensive fault tree that captures the complex interactions between the various events and factors that precipitated the undesired event.

Building a fault tree is an iterative process, so you continue to break down contributing events into their basic sub-events until the events cannot be parsed out any further. As you get new information and/or system conditions change, you might need to make several adjustments to refine the fault tree.

Step 4: Gather failure data

In order to quantify the risks associated with the undesired event, you need to gather failure data (from historical records, industry databases, expert opinions, etc.) for the basic events in the fault tree. The failure data should be expressed as failure probabilities or failure rates, depending on the type of analysis you’re conducting.

Step 5: Perform the analysis

Once you construct the fault tree and gather the failure data, you perform the analysis, wherein you calculate the probability of the undesired event occurring and identify the most critical contributing factors. Use either a qualitative or a quantitative data analysis method.

A qualitative analysis focuses on understanding the structure of the fault tree, the relationships between events, and the identification of critical paths and minimal cut sets (the smallest set of events that can create the undesired event). Qualitative analysis can help prioritize remedial actions and identify areas for further investigation.

A quantitative methodology, on the other hand, involves calculating the probability of the undesired event occurring based on the failure probabilities of the basic events. Quantitative analysis can help inform risk management decisions and evaluate the effectiveness of proposed improvements.

Step 6: Interpret the results

After performing the analysis, it’s time to interpret your results and communicate any relevant information to the necessary stakeholders.

The results of an event tree analysis depend on the quality of the input data and the assumptions made during the analysis. As such, you should view the results as a starting point for further investigation and validation, rather than a definitive conclusion.

Step 7: Implement improvements and monitor progress

Based on the findings of the fault tree analysis, you implement preventive measures and improvements as necessary to eliminate or decrease the likelihood of an undesired event. Therefore, be sure to monitor the performance of these improvements and continually update the fault tree to reflect any changes in system design, operating conditions or component performance, so that your tree remains accurate and useful to your organization.

Think Keynotes

Win the enterprise AI race

Join Arvind Krishna to see how IBM is enabling AI-first enterprises through hybrid cloud and emerging quantum capabilities.

Get Started with watsonx Orchestrate®

Benefits of fault tree analysis

FTA provides a visual depiction of contributing factors and events that can lead to a system failure, making it easier to understand complex interactions between system components.
FTA allows you to calculate of the probability of a failure event occurring, enabling better risk management and decision-making and helping teams be proactive about corrective actions.
Since you can analyze only one output event at a time, fault tree analysis helps teams stay organized as they assess system levels and work through effects analyses methodically.
Unlike other approaches to failure mode and effects analyses (FMEAs), FTA accounts for human error, which can help teams understand whether issues are related to deviations from standard operating procedure.
FTA identifies which failures are likeliest to occur, helping teams decide which issues require urgent attention.

Limitations of fault tree analysis

The accuracy and effectiveness of FTA relies heavily on the expertise of the analysts, their ability to identify relevant causes of failure, and their understanding of the complexities of the fault tree itself.
FTA is best suited for smaller system analyses. Large, complex systems typically requires large, complex fault trees, making analysis time-consuming and challenging.
Failure data availability and quality determine the precision of the calculated probabilities in a fault tree.
Fault tree analysis allows you to examine only one top event at a time.

Make a confident CMMS choice with Verdantix’s independent Green Quadrant 2025

Learn how the CMMS market is evolving as organizations focus on digitizing maintenance, boosting asset reliability and improving real-time visibility.

Resources

IDC 2025 SaaS CSAT Award Report for Financial GRC

See how customers rated IBM for value, implementation, AI-driven capabilities and data security.

Why IBM adopted OpenPages for enterprise-wide GRC

Discover how IBM’s CIO organization implemented IBM OpenPages to unify governance, risk and compliance; streamline audit processes; and improve visibility across business units.

IDC MarketScape: Worldwide GRC Software Vendor Assessment 2025

Learn why IBM OpenPages was recognized for providing a comprehensive cross-organization GRC capability with all the features a mature GRC organization can utilize.

IBM named a leader in the 2025 Gartner® Magic Quadrant™ for Governance, Risk and Compliance Tools, Assurance Leaders

See why Gartner recognized IBM as a leader for its IBM OpenPages solution. Understand how IBM helps enterprises strengthen governance, streamline risk management and enable compliance with AI-powered insights.

Not just a chatbot: Build virtual agents that are actually helpful with gen AI

Listen in to see if virtual agents can replace humans as they become faster and more accurate with generative AI.

Unlock the power of generative AI and modernize your business

Explore how CEOs are using generative AI and application modernization to drive innovation and stay competitive.

Transforming B2C and B2B customer experiences with omnichannel order fulfillment

Learn why retailers must understand what inventory is available in order to meet customer demand.