What is problem management?
Explore IBM Instana
IT professionals checking servers

Problem management is the discipline of proactively and reactively identifying IT issues and solutions to minimize any impact on an IT team’s ability to deliver services to end users.

Issues within IT services are called incidents and problems, which differ based on their frequency, impact, and solutions. In general, multiple recurring incidents are considered problems that are likely to continue if left unaddressed.

IT services include a complex system of interdependent applications, software, hardware, IT infrastructure and other technologies. Service disruptions can disrupt an organization’s goal of continual service improvement and create serious reputational and financial issues for an organization, so their IT teams need to prioritize solving problems efficiently and effectively.

Proactive problem management is an important component of an organization’s approach. It is necessary to identify incidents and known problems and solve errors before they cascade into even larger ones.

Thankfully, organizations can utilize automation to help better manage the impact of incidents and problems, delivering improved services and more resilient applications to maximize uptime. This can lead to reduced costs and improved decision-making. Problem management can also use templates, such as ones focused on escalation information and problem reviews, to minimize human resources previously dedicated to key problem management tasks.

Organizations that are starting or improving a robust problem management process must also manage organizational change.1  (link resides outside ibm.com) Robust problem management not only addresses incidents in an organization’s tech stack, but it also compels that organization to explore better ways to address incidents across their operations. Effective problem management requires rigorous categorization and prioritization, which subsequently allow the business to address its most pertinent problems. Ideally, an organization will focus on solving major incidents and major problems first and then transition to other incidents and problems that have a less pronounced impact on the business.

Take the tour

IBM Instana Observability gives everyone across the enterprise user-friendly access to the data they want with the context they need to deliver rapid issue prevention and remediation.

Related content

Sign up for the IBM newsletter

Problem management vs. incident management

Problem management is intended to prevent the incident from reoccurring by addressing the root cause. It is related to incident management in that if the incident has occurred several times, it should be diagnosed and investigated as a problem or known error.

Incident management without problem management only addresses symptoms and not the underlying cause, leading to similar incidents occurring in the future. Effective problem management identifies a permanent solution to problems, decreasing the number of incidents an organization will have to manage in the future.

Ultimately, problem management seeks to understand the problem lifecycle, identify the root cause of the problem and fix the conditions2 (link resides outside ibm.com) that led to its creation.

Reactive and proactive problem management

A problem management team can either engage in reactive or proactive problem management, depending on what incidents they observed and what historical data they have. Reactive problem management is concerned with identifying the problem when it occurs and solving it as quickly as possible. The problem first must occur before organizations can apply reactive problem management.

Proactive problem management involves more investigative work on why a problem is happening and building a solution to prevent it from happening again. This type of problem management is more concerned with identifying the root cause, so the team can deploy a lasting solution that helps to avoid future problems.

The role of ITSM and ITIL in problem management

Effective problem management is an important component of IT service management (ITSM). ITSM is how an organization ensures its IT services work in the way that its users and business need them to work. The goal of any organization’s ITSM strategy is to enable and maintain optimal deployment, operation and management of every single IT resource. Problem management is a core component of ITSM.

Organizations often utilize several native and open-source strategies for accomplishing ITSM, especially using the Information Technology Infrastructure Library (ITIL).

ITIL is the most widely adopted best-practices guidance framework for implementing and documenting ITSM. ITIL problem management uses ITIL processes to minimize the foundational work that needs to go into addressing any one problem. Many problems that organizations face such as server outages and cyber security issues have happened before to other organizations and can often have standardized responses. Therefore, ITSM approaches often incorporate ITIL to minimize the required amount of custom work to solve IT problems.

Key problem management components

Most problem management approaches follow a similar pattern of assessment, logging, analysis, and solution. Each component is necessary to ultimately solve the problem.

  • Problem detection: Automated systems or IT professionals identify recurring incidents that could be classified as a problem.
  • Problem assessment: This involves the identification and categorization of an incident as a problem record or as an unrelated issue that is unlikely to occur again.
  • Problem logging: This involves the IT team logging the identified problem and tracking each occurrence. Problem management teams that encounter a problem log it, often via a self-service platform, to create a problem record, which is the comprehensive accounting for the problem, any related incidents, where and how the problem occurred, the root cause analysis and the solution. That creates a known error record that is entered in the known error database (KEDB). Advanced organizations will often combine their problem-management and knowledge-management approaches.
  • Root cause analysis: The organization studies the underlying issues behind these problems and develops a roadmap to create a long-term solution.
  • Solve the problem: When an IT team understands the problem and its root cause, it can address the problem (also known as problem control). This can involve a quick or protracted response depending on the severity or complexity of the problem.
  • Error control: Finally, an organization that successfully diagnoses3 (link resides outside ibm.com) and solves the problem through a work-around or a permanent resolution conducts error control for that problem.
Key problem management tools and processes

Problem management requires a well-thought-out approach to ensure a team is allocating resources as efficiently as possible. Fortunately, problem management teams can utilize several levers to successfully address problems effectively and efficiently, getting to the root cause and creating solutions that can stop the problem from happening again.

Pareto Analysis

The Pareto principle, also known as the Pareto 80:20 rule, states that about 80% of problems arise from 20% of the causes. Put another way, an overwhelming majority of problems stem from a few major causes. Therefore, problem solving for those 20% of issues will efficiently solve many of the organization’s problems. 

5 Whys Analysis

Problem management teams ask why something happened five times to help get to the root cause of any problem. Addressing user experience is a good example of the five whys in action.

  • Why #1: Users report substandard experience because of long load times.
  • Why #2: The website takes too long to load because of slow or overloaded servers.
  • Why #3: Inefficient code causes the server to run slowly.
  • Why #4: Poorly written database queries fetch unnecessary data leading to inefficient code.
  • Why #5: Overly complicated queries with multiple nested subqueries, which ultimately contribute to the long load time.

Therefore, DevOps can solve the user experience problem by simplifying the query structure.

Root Cause Analysis

Root cause analysis (RCA) is the quality management process by which an organization searches for the root of a problem, issue or incident after it occurs. Organizations need to find the root cause of any problem to effectively solve the problem.

Fishbone diagram

This is a tool (link resides outside ibm.com) for identifying the potential causes of a specific problem. It places the problem or effect at the “head” of the fish while using the “bones” to list potential causes organized by categories. It is like the five whys approach, but it adds more structure to the potential causes part of the cause analysis.

Benefits of problem management

The goal of problem management is to minimize downtime, increase efficiency and improve service delivery. Below are some of the more impactful benefits of problem management.

  • Enhanced security: Identifying the underlying cause of incidents is an important part of cyber risk management. Organizations that merely patch or resolve individual incidents without exploring their root cause may never discover significant security issues. Problem management teams can work in coordination with security professionals to understand which incidents and problems result from malicious actors or security flaws; both of which can create catastrophic problems for the organization.
  • Increased customer satisfaction: Customers have certain expectations about their services and are unhappy whenever they don’t receive the services that they expected or paid for. Organizations that cannot deliver service reliably risk losing customers and will struggle to recruit new ones. Customers can tolerate occasional disruptions to service, like the inability to access a website or application due to a traffic overload or a chatbot malfunctioning. However, they will have less tolerance for sustained downtime or potential risks to cyber-criminal activity. Incidents that cascade into larger problems are likely to create unhappy customers. By prioritizing problem identification and problem resolution, organizations can minimize downtime and create happier customers.
  • Improved knowledge management: Long-standing organizations have likely encountered many types of tech incidents and problems throughout their history. Organizations that prioritize knowledge management, the process of identifying, organizing, storing and disseminating information in a knowledge base within an organization, as part of their problem management approach have a better chance of avoiding repeat incidents. By capturing this information in a problem record, organizations can create known error databases so they can avoid future incidents and create permanent solutions.
  • Increased productivity and employee satisfaction: Organizations have two options when pursuing problem management: reactive problem management or proactive problem management. IT employees join organizations to work on large problems and make meaningful differences. They do not want to repeatedly fix the same recurring problems without having the chance to work on higher-level opportunities. Organizations need sophisticated problem management to improve their IT operations and to keep employees happy and engaged.
Problem management products
Observability IBM Instana APM

Get the context you need to resolve incidents faster with IBM’s observability solution.

Explore IBM Instana

Observability Flexera One with IBM Observability

Optimize software usage and cost

Explore Flexera One with IBM Observability

Problem management resources

{In one or two sentences, introduce to the learn topic resources (optional).}

What is AIOps?

Learn how Artificial Intelligence for IT Operations (AIOps) uses data and machine learning to improve and automate IT service management

What is application performance management?

Predict and prevent performance issues before they impact your business with application performance management

Incident management vs. problem management

Incidents are errors or complications in IT service that need remedying. Those that point to underlying or more complicated issues that require more comprehensive addressing are called problems.

What is IT operations?

IT operations and AIOps oversee and automate the management, delivery and support of IT services throughout an organization

What is IT service management?

ITSM is how an organization ensures its IT services work the way users and the business need them to work

What is site reliability engineering?

Automate IT operations tasks, accelerate software delivery, and minimize IT risk with site reliability engineering

Take the next step

IBM Instana provides real-time observability that everyone—and anyone—can use. It delivers quick-time-to-value while ensuring your observability strategy can keep up with the dynamic complexity of today’s environments, and tomorrow’s. From mobile to mainframe, Instana supports over 250 technologies and growing.

Explore IBM Instana Start your trial
Footnotes

Problem Management: A Practical Guide(link resides outside ibm.com), Jim Bolton III and Buff Scott III, 2016 

What is root cause analysis? A proactive approach to change management(link resides outside ibm.com), CIO, 6, May 2022

Problem Management: Frequently Asked Questions(link resides outside ibm.com), University of Minnesota