Troubleshooting errors and failures in a failed workflow instance

Using the Workflow Inspector, you can take corrective actions on workflow instances that failed or are in an error state one at a time or in bulk across multiple locations. The best corrective action for you to take depends on the nature of the failure.

Failures can be of two types: system failures or failures due to invalid or corrupted data. A workflow instance can fail because the network connection is down or a power failure occurs. A workflow instance can also fail because changes occur in the execution environment after the instance migrates, and those changes cause data to become invalid or corrupted.

To troubleshoot an error in a failed workflow instance:

  1. Open the skill catalog and click the Workflow Admin button. Then open the Workflow Inspector.

  2. Filter the search to find failed workflow instances only. Click Search.

    You can apply other filters to scope the results to make it easier to process the failures. For example, you might want to process the failures for one workflow at a time.

  3. In the results section of the Workflow Inspector, select a failed instance.

  4. In the details section, start by reviewing what caused the failure. Next to the status, click Error details.

  5. In the window that opens, review the error details including the Java trace. If you want to save the error details to a file, click Export Error.

  6. Based on what caused the failure and what information you find in the error details, take the appropriate action for the instance, its tasks, or its activities. For example:

    • If a system going offline caused the error and the system is now running, click Retry failed steps.
    • If a task is assigned to a user who is not valid, select the task and then select Reassign to user and then assign that task to an appropriate user.
    • If the data for the instance, task, or activity is not valid, edit it to set it to an appropriate value.

Fixing the problem might take a number of actions. For example, to skip a task, you might want to first edit its output data. Because skipping a task means that it does not update its output data, you can edit this data so that it is valid for steps that are later in the flow.

Tip: If you suspect that multiple instances were caused by a common error, you can select multiple workflow instances to take a bulk action on all of them. For example, if you are sure that a network problem caused 50 workflow instances to fail, you can select those instances and then click the Retry failed steps action. If the bulk action you want to take is not available, then the action is not available in at least one of the selected instances.

Parent topic:

Administering instances with Workflow Inspector