Question & Answer
Question
Techniques for troubleshooting problems
Answer
Techniques for troubleshooting problems
Troubleshooting is a systematic approach to solving a problem.The goal of troubleshooting is to determine why something does notwork as expected and how to resolve the problem.
The first step in the troubleshooting process is to describe theproblem completely. Problem descriptions help you and the IBM technical-supportrepresentative know where to start to find the cause of the problem.This step includes asking yourself basic questions:
- What are the symptoms of the problem?
- Where does the problem occur?
- When does the problem occur?
- Under which conditions does the problem occur?
- Can the problem be reproduced?
The answers to these questions typically lead to a good descriptionof the problem which can then lead you to a problem resolution.
What are the symptoms of the problem?
Whenstarting to describe a problem the most obvious question is "Whatis the problem?" This question might seem straightforward; howeveryou can break it down into several more-focused questions that createa more descriptive picture of the problem. These questions can include:
- Who or what is reporting the problem?
- What are the error codes and messages?
- How does the system fail? For example is it a loop hang crashperformance degradation or incorrect result?
Where does the problem occur?
Determiningwhere the problem originates is not always easy but it is one ofthe most important steps in resolving a problem. Many layers of technologycan exist between the reporting and failing components. Networksdisks and drivers are only a few of the components to consider whenyou are investigating problems.
The following questions helpyou to focus on where the problem occurs to isolate the problem layer:
- Is the problem specific to one platform or operating system oris it common across multiple platforms or operating systems?
- Is the current environment and configuration supported?
If one layer reports the problem the problem does not necessarilyoriginate in that layer. Part of identifying where a problem originatesis understanding the environment in which it exists. Take some timeto completely describe the problem environment including the operatingsystem and version all corresponding software and versions and hardwareinformation. Confirm that you are running within an environment thatis a supported configuration; many problems can be traced back toincompatible levels of software that are not intended to run togetheror have not been fully tested together.
When does the problem occur?
Develop a detailedtimeline of events leading up to a failure especially for those casesthat are one-time occurrences. You can most easily develop a timelineby working backward: Start at the time an error was reported (as preciselyas possible even down to the millisecond) and work backward throughthe available logs and information. Typically you need to look onlyas far as the first suspicious event that you find in a diagnosticlog.
To develop a detailed timeline of events answer thesequestions:
- Does the problem happen only at a certain time of day or night?
- How often does the problem happen?
- What sequence of events leads up to the time that the problemis reported?
- Does the problem happen after an environment change such as upgradingor installing software or hardware?
Responding to these types of questions can give you a frameof reference in which to investigate the problem.
Under which conditions does the problem occur?
Knowingwhich systems and applications are running at the time that a problemoccurs is an important part of troubleshooting. These questions aboutyour environment can help you to identify the root cause of the problem:
- Does the problem always occur when the same task is being performed?
- Does a certain sequence of events need to occur for the problemto surface?
- Do any other applications fail at the same time?
Answering these types of questions can help you explain theenvironment in which the problem occurs and correlate any dependencies.Remember that just because multiple problems might have occurred aroundthe same time the problems are not necessarily related.
Can the problem be reproduced?
From a troubleshootingstandpoint the ideal problem is one that can be reproduced. Typicallywhen a problem can be reproduced you have a larger set of tools orprocedures at your disposal to help you investigate. Consequentlyproblems that you can reproduce are often easier to debug and solve.However problems that you can reproduce can have a disadvantage:If the problem is of significant business impact you do not wantit to recur. If possible re-create the problem in a test or developmentenvironment which typically offers you more flexibility and controlduring your investigation.
- Can the problem be re-created on a test system?
- Are multiple users or applications encountering the same typeof problem?
- Can the problem be re-created by running a single command a setof commands or a particular application?
Was this topic helpful?
Document Information
Modified date:
08 December 2018
UID
ibm10750893