Two skilled technicians are actively involved in the intricate process of circuitry assembly within a bustling factory environment.

What is failure mode and effects analysis (FMEA)?

Failure mode and effects analysis (FMEA), defined

Failure mode and effects analysis (FMEA) is a structured failure mitigation framework that identifies all possible failures for the components of a design, manufacturing process, service or product before they occur. 

FMEA analyzes the causes of failures, infers the effects of failure and prioritizes corrective actions based on a risk assessment ranking system. Through focusing on the risks with the highest potential impact, organizations can efficiently improve reliability, reduce the costs of failure and strengthen operational resilience. 

FMEA’s step-by-step procedure helps organizations proactively increase reliability and manage risk by focusing on the process failures with the highest projected impacts. Organizations can use FMEA to evaluate both existing processes and new ones before implementation, leading to stronger quality controls, enhanced brand perception and greater customer satisfaction and lifetime value.

Organizations are augmenting FMEA with AI in operations management to improve failure prediction and automate risk detection by using real-time operational data. FMEA is critical for risk governance, enabling organizations to react in advance to disruptions and make reliability a core component of service and product lifecycle management.

Types of failure mode and effects analysis (FMEA)

Various approaches to potential failure mode and effects analysis (FMEA) have been developed to cover a range of industries and processes. Different types of FMEA apply to specific stages of the product and operational lifecycle, giving organizations the necessary frameworks through which they can apply risk analysis for maximum strategic value.

The types of FMEA include:

  • Design FMEA (DFMEA) focuses on the product design process and is used to minimize design flaws, such as material choices and tolerances, before production begins.
  • Process FMEA (PFMEA) covers the manufacturing and assembly processes. It aims to prevent production defects caused by process variability, human error or equipment failure.
  • System FMEA (SFMEA) is a high-level evaluation of the interactions between an overall system and its subsystems. The goal of SFMEA is to identify systemic risks and prevent cascading failures.
  • Service FMEA applies PFMEA principles to service delivery transactions and workflows, with a focus on consistency, reliability and customer experience.
  • Software FMEA (SW-FMEA) analyzes software architecture and logic to identify potential failure modes, such as bugs, latency issues or system incompatibilities.
  • Machinery FMEA (MFMEA) focuses on machinery and equipment to reduce downtime and operational risk.
  • Monitoring and system response (FMEA-MSR) is a supplemental method that assesses the system’s ability to monitor performance and respond to failure during real-world use.
  • Failure mode, effects and criticality analysis (FMECA) extends standard FMEA by quantifying criticality analysis to more precisely prioritize risks.
Mixture of Experts | 19 June, episode 112

Listen for weekly AI news & analysis

Hear from industry experts on the latest in AI news, listen to Mixture of Experts podcast. New episodes on Fridays at 6am EST.

The seven steps of FMEA

FMEA guides organizations through a standardized, step-by-step methodology. Initially developed by the US military in the 1940s, it is now recognized globally as defined in the 2019 AIAG-VDA FMEA Handbook.

AIAG-VDA FMEA combines the best practices of the US’s Automotive Industry Action Group (AIAG) and the German Association of the Automotive Industry (VDA). FMEA standardization facilitates consistency in risk evaluation and quality governance.

The seven AIAG-VDA FMEA steps are:

  1. Planning and preparation
  2. Structure analysis
  3. Function analysis
  4. Failure analysis
  5. Risk analysis
  6. Optimization
  7. Results

1. Planning and preparation

The first step in FMEA is to lay out a plan for the entire sequence and establish the scope of the study. Planning ahead aligns organizations and stakeholders, facilitates more effective quality outcomes and situates risk analysis within broader business objectives.

The planning phase itself is divided into five stages:

  1. State intent: Establish why the FMEA is taking place—whether the organization is assessing a new process, responding to a complaint or attempting to update its data.
  2. Start early: For maximum impact, FMEA should be conducted with plenty of time to implement corrective actions.
  3. Assemble the team: FMEA is a collaborative exercise requiring a cross-functional team for optimal outcomes. Teams typically include stakeholders from engineering, operations, quality and supply chain, with a facilitator guiding the process.
  4. Set expectations: Set the scope for each stage of the process, including the specific tasks that each stage will contain.
  5. Build the toolkit: Determine which tools, such as software applications and platforms, will be used to conduct the failure analysis. Teams often use FMEA worksheets, templates or applications to document findings.

2. Structure analysis

Structure analysis outlines the process steps being analyzed for failure potential. The team generates a flowchart for the process that shows its role in the greater system as well as any subsystems that stem from it. Structure analysis yields system-level visibility that can reveal interdependencies and help prevent cascading failures.

Workflow diagrams are often used to facilitate structure analysis. Through process mapping, the system, product or process is broken down into individual components or subsystems.

Structure analysis reveals the interdependencies between process steps—the components or subsystems—and shows how a single point of failure can affect the overall system or product.

3. Function analysis

Function analysis establishes the purpose of each stage in the structure as outlined in the previous step. The team describes the function of each subsystem or component and traces the functional relationships between them.

Having a granular, functional understanding of the system or product will help the team members with brainstorming failure modes and conducting root cause analysis in later FMEA steps.

Function analysis helps bridge the gap between how a product or service works and whether it suits business requirements and customer needs.

4. Failure analysis

The fourth step of the FMEA process tasks the team with identifying and describing potential failures for each function from the third step:

  • Failure mode (FM) is the specific nature of the failure. FM describes any way in which the process fails to meet its function. For example, an electrical connector might fail to maintain a stable connection, disrupting the flow of electricity through the system.
  • Failure effect (FE) is how the failure cascades to higher levels in the structure. Effects can include safety consequences, regulatory noncompliance or simple inconveniences to the user.
  • Failure cause (FC) is the underlying cause of each failure. Causes can range from production errors and manufacturing defects to design flaws and environmental factors like extreme heat or cold. FMEA asks teams to describe causes as concretely as possible to aid further analysis and corrective action. Fault tree analysis is one method teams can use for identifying FC.

Failure analysis compiles a reusable knowledge base of failure modes, accelerating future risk assessments.

5. Risk analysis

After establishing all possible failure states, the team assesses the level of risk presented by each failure according to three metrics:

  • Severity (S) represents the magnitude of the failure’s effect on the system or product. Failures with greater effects, such as regulatory noncompliance or serious operator injury, are given higher severity ratings.
  • Occurrence (O) is the projected frequency of the failure. Teams use data like historical data, tests and expert knowledge to determine the probability of occurrence for each failure.
  • Detectability (D) reflects the likelihood of the failure being detected before it has an effect. Failures that are more difficult to detect receive high risk ratings as compared to glaring failures that will be detected before causing harm.

Severity, occurrence and detectability are typically ranked 1–10. Risk analysis turns potential failures into actionable priorities.

Action priority (AP) versus risk priority number (RPN)

Traditional FMEA methods determined risk rank through the risk priority number (RPN). The formula for calculating RPN multiplies severity, occurrence and detectability:

RPN = S x O x D

AIAG-VDA FMEA introduced the concept of action priority (AP), a new risk rank method that characterizes risks by the urgency of corrective action required. This change strengthened the connection between risk severity and the need for corrective action, rather than asking teams to make that connection themselves.

AP assigns each failure one of three values:

  • High-priority (H) failures require immediate action.
  • Medium-priority (M) failures require corrective action, but less urgently than high-priority failures.
  • Low-priority (L) failures can be monitored and do not require corrective action.

Teams calculate AP by consulting tables that contain various combinations of values for S, O and D. In a pivot toward a greater focus on safety, AP classifies any failure with an S value greater than or equal to 9 as high risk, regardless of its O and D values.

6. Optimization

After prioritizing risks, the team identifies and plans corrective actions to eliminate risks in order of rank. Optimization involves lowering AP scores to more acceptable levels through making risks less severe, less frequent or more easily detectable. Corrective actions should lead to safety and reliability improvements.

Optimization is heavily dependent on cross-functional collaboration. Quality management hinges on process controls in design, manufacturing and other business functions. By systematically implementing corrective actions, organizations can reduce costs, improve reliability and shorten time-to-market.

7. Results

The final stage of the FMEA process is where teams document their findings. Failure points, risks, corrective actions and outcomes all have a place in the FMEA worksheet. The results phase helps centralize and further deploy FMEA insights across the organization.

Thorough documentation helps ensure organization-wide consistency and accountability and informs the creation of subsequent quality control plans and follow-up procedures, such as reliability-centered maintenance (RCM) plans. The benefits of RCM include better planning, reduced preventative maintenance and lower mean time to repair (MTTR).

Documentation also contributes to ongoing regulatory compliance. Organizations must often provide proof of risk identification and mitigation to earn required industrial certifications.

Benefits of FMEA

FMEA delivers numerous benefits at the enterprise level above and beyond quality improvement. These benefits include:

  • Reduced cost of failure through early risk detection
  • Improved brand perception and customer value
  • Greater product or service reliability and customer satisfaction
  • Stronger regulatory compliance and audit readiness
  • Quicker time-to-market through proactive risk mitigation
  • More cross-functional alignment and collaboration across teams

Implementing FMEA with digital tools like artificial intelligence (AI) equips organizations for predictive failure modeling and scalable continuous improvement.

Authors

Ivan Belcic

Staff writer

Ian Smalley

Staff Editor

IBM Think

Related solutions
IBM Maximo

Manage, maintain, and optimize your assets with AI-powered insights and automation.

Explore IBM Maximo
Asset lifecycle management (ALM) software and solutions

Use AI and data insights to optimize asset performance from start to finish.

Explore ALM solutions
Sustainability consulting services

Turn sustainability goals into action with AI-powered strategy and transformation.

    Explore sustainability consulting services
    Take the next step

    See how IBM Maximo® helps you optimize assets, improve maintenance and support sustainability goals—book a demo to explore it in action.

    1. Explore IBM Maximo
    2. Book a live demo