Troubleshooting problems in Db2
The topics about troubleshooting Db2 problems provide a variety of information that can help you when you have problems associated with the Db2 for z/OS® product. IBM® Support personnel might ask you to refer to troubleshooting information when they help you with a specific problem.
Troubleshooting is a systematic approach to solving a problem. The goal of troubleshooting is to determine why something does not work as expected and how to resolve the problem.
- Most of the information from the Db2 Diagnosis Guide and Reference is now available without restriction in Troubleshooting problems in Db2 (this section) in the Db2 product documentation, or as down-loadable the PDF manual: Troubleshooting for Db2.
- The remaining information is available only for customers who have an Db2 12 for z/OS license associated with their IBMid. See Db2 12 for z/OS Licensed Diagnosis Information.
- What are the symptoms of the problem?
- Where does the problem occur?
- When does the problem occur?
- Under which conditions does the problem occur?
- Can the problem be reproduced?
The answers to these questions typically lead to a good description of the problem, which can then lead you a problem resolution.
What are the symptoms of the problem?
- Who, or what, is reporting the problem?
- What are the error codes and messages?
- How does the system fail? For example, is it a loop, hang, crash, performance degradation, or incorrect result?
Where does the problem occur?
Determining where the problem originates is not always easy, but it is one of the most important steps in resolving a problem. Many layers of technology can exist between the reporting and failing components. Networks, disks, and drivers are only a few of the components to consider when you are investigating problems.
- Is the problem specific to one platform or operating system, or is it common across multiple platforms or operating systems?
- Is the current environment and configuration supported?
If one layer reports the problem, the problem does not necessarily originate in that layer. Part of identifying where a problem originates is understanding the environment in which it exists. Take some time to completely describe the problem environment, including the operating system and version, all corresponding software and versions, and hardware information. Confirm that you are running within an environment that is a supported configuration; many problems can be traced back to incompatible levels of software that are not intended to run together or have not been fully tested together.
When does the problem occur?
Develop a detailed timeline of events leading up to a failure, especially for those cases that are one-time occurrences. You can most easily develop a timeline by working backward: Start at the time an error was reported (as precisely as possible, even down to the millisecond), and work backward through the available logs and information. Sometimes, you need to look only as far as the first suspicious event that you find in a diagnostic log.
- Does the problem happen only at a certain time of day or night?
- How often does the problem happen?
- What sequence of events leads up to the time that the problem is reported?
- Does the problem happen after an environment change, such as upgrading or installing software or hardware?
Responding to these types of questions can give you a frame of reference in which to investigate the problem.
Under which conditions does the problem occur?
- Does the problem always occur when the same task is being performed?
- Does a certain sequence of events need to occur for the problem to surface?
- Do any other applications fail at the same time?
Answering these types of questions can help you explain the environment in which the problem occurs and correlate any dependencies. Remember that just because multiple problems might have occurred around the same time, the problems are not necessarily related.
Can the problem be reproduced?
- Can the problem be re-created on a test system?
- Are multiple users or applications encountering the same type of problem?
- Can the problem be re-created by running a single command, a set of commands, or a particular application?