Describing your problem

Stabilized feature: Service Flow Runtime and Service Flow Modeler capability in IBM Developer for z/OS 14.2.3 are stabilized. Consider exposing and orchestrating applications as API services by using z/OS Connect Enterprise Edition or CICS web services, or by writing web applications in Java or Node.js. See also Stabilization notices and discontinued functions.

Troubleshooting is a systematic approach to solving a problem. The first step in the troubleshooting process is to describe the problem completely. Without a problem description, neither you nor IBM® can know where to start to find the cause of the problem.

This step includes asking yourself basic questions, such as:

The answers to these questions typically lead to a good description of the problem, and that is the best way to begin problem resolution.

What are the symptoms of the problem?

  • Who, or what, is reporting the problem?
  • What are the error codes and messages?
  • How does the system fail? For example, is it a loop, hang, crash, performance degradation, or incorrect result?
  • What is the business impact of the problem?

Where does the problem occur?

Determining where the problem originates is not always easy, but it is one of the most important steps in resolving a problem. The following questions can help you to focus on where the problem occurs to isolate the problem area.

  • Is the problem specific to one platform or operating system, or is it common across multiple platforms or operating systems? For example, is the service requester on a different platform?
  • Is the current environment and configuration supported? For example, are you using one of the supported interfaces when trying to access the service flow?
  • Were there any problems with the service flow when it was modeled? Errors that occur in the service flow can manifest as problems at run time.
  • Is the problem specific to one service flow or server adapter?

Remember that, even though one area might report the problem, the problem does not necessarily originate there. Part of identifying where a problem originates is understanding the environment in which it exists. Take some time to completely describe the problem environment, including the operating system, its version, all corresponding software and versions, and hardware information. Confirm that you are running in an environment that is a supported configuration; many problems can be traced back to incompatible levels of software that are not intended to run together or have not been fully tested together.

When does the problem occur?

Develop a detailed timeline of events leading up to a failure, especially for those cases that are one-time occurrences. You can most easily do this by working backward: Start at the time an error was reported (as precisely as possible, even down to the millisecond), and work backward through the available logs and information. Typically, look only as far as the first suspicious event that you find in a diagnostic log; however, finding that event is not always easy and takes practice. Knowing when to stop looking is especially difficult when multiple layers of technology are involved, and when each has its own diagnostic information.

To develop a detailed timeline of events, try to answer these questions:

  • Does the problem happen only at a certain time?
  • How often does the problem happen?
  • What sequence of events leads up to the time that the problem is reported?
  • Does the problem happen after an environment change, such as upgrading or installing software or hardware?

Responding to questions like these can help to provide you with a frame of reference in which to investigate the problem.

Under which conditions does the problem occur?

Knowing what other systems and applications are running at the time that a problem occurs is an important part of troubleshooting. These and other questions about your environment can help you to identify the root cause of the problem:

  • Does the problem always occur when the same task is being performed?
  • Does a certain sequence of events lead to the problem?
  • Do any other applications fail at the same time?
  • Which version of the tooling did you use to model your service flows? Have you regenerated the service flows on a later version?

Answering these types of questions can help you explain the environment in which the problem occurs and correlate any dependencies. Remember, just because multiple problems might have occurred around the same time, the problems are not necessarily related.

Can the problem be reproduced?

From a troubleshooting standpoint, the "ideal" problem is one that can be reproduced. Typically with problems that can be reproduced, you have more tools or procedures at your disposal to help you investigate. Consequently, problems that you can reproduce are often easier to debug and solve. It is recommended that if the problem is of significant business impact, recreate the problem in a test or development environment, which typically offers you more flexibility and control during your investigation.

  • Can the problem be recreated on a test machine?
  • Are multiple users or applications encountering the same type of problem?
  • Can the problem be recreated by running a single command, a set of commands, or a particular application, or a stand-alone application?