Items stuck in the merge workflow step

We have recently seen a pattern where items stay in the "Merge" step for an extended time. How can we check whether any of these items are stuck or whether it is waiting for a split item entry to join before moving to the next step? If items are stuck, how can we move them to the next step and how can we find the root cause?

Symptoms

An item appears in the merge step when any of the feeder steps complete processing. The item stays there waiting for the remaining feeder steps to complete processing and maintains an internal count of how many of the feeder steps have completed processing. When all prior steps have completed processing, the item will move onto the next step. Therefore, for an item to be stuck, all feeder steps must have completed processing and the item should still be present in the merge step.

Diagnosing the problem

Following are some of the ways of identifying a stuck item:
  1. Manually review all the steps from item split to merge and check whether an entry exists for that item in the prior steps. If it does, then the merge step is waiting for that entry before sending the item through and is hence not stuck. If an entry doesn't exist in prior steps, then it is stuck.
  2. The product maintains an audit log of all activities in the workflow. This is done in the CEH table and among other things, it stores the following information in database columns:
    • 2.1. CEH_DATE: Time the event happened
    • 2.2. CEH_ENTRY_KEY: Primary key of item
    • 2.3. CEH_WFL_NAME: Name of the workflow
    • 2.4. CEH_USER_NAME: The user who moved the item
    • 2.5. CEH_STEP_PATH: Name of the workflow step
    • 2.6. CEH_EVENT: The event, which happened like BEGINSTEP, ENDSTEP, RESERVE_ACTIVE_LOCK, RELEASE_ACTIVE_LOCK and so on
    Therefore, if an item entered the workflow step, then it should register a BEGINSTEP event for that item. If x number of input steps feed an item, then an item will wait for x BEGINSTEP events before moving the item through to the next step, through the ENDSTEP event in workflow. If there are less than x BEGINSTEP, then the item is not stuck and is just waiting for one of split entries to join. If it has x BEGINSTEP events, and is still in the merge step, then it is stuck. You can use the following SQL to get this information: SELECT * FROM CEH WHERE CEH_ENTRY_KEY = '<Primary Key of Item>' ORDER BY ceh_date DESC;
  3. The number of split item entries, which have reached the merge step is saved in the CAE_DATA column of the CAE table. For a merge step with x feeding steps, if the CAE_DATA has less than x, then it is not stuck while a value of x and the item still in the workflow step would indicate it is stuck.
  4. We can also get the equivalent of the number of BEGINSTEP events fired or CAE_DATA programmatically using the getEntryMergeState function. We can run this function for one or all items for the workflow step and if the value returned is x (x being number of feeder steps), then the item is stuck. Usage: int CollaborationArea::getEntryMergeState(Entry entry, String stepPath

Resolving the problem

You can use any of the following methods to move an item through to the next step:
  1. Open the merge step and manually move the stuck item to the next step.
  2. You can use code to move an item through to the next step by using the following function: HashMap CollaborationArea::moveEntryToNextStep(Entry entry, String stepPath, String exitValue)

    The above method posts a request to move the entry from the specified stepPath to the next step for the given exitValue. It returns a hash map of item primary key to string of validation errors (which can be zero-length). The move will take place after the current transaction has committed.

  3. If multiple items are stuck, then you can move all of them simultaneously through code. Following sample code loops through all the items in a collaboration area workflow step, checks for items whose CAE_DATA is equal to x (x being number of feeder steps) and then moves them to the next step:
    var sColAreaName = "<WORKFLOW_NAME>";
    var sStepPath = "<Path_To_Workflow_Step>";
    var sStepAction = "DONE";
    var sAttribPath = "Attribute_Path_For_Primary_Key";
    var hmResult;
    var oColArea = getColAreaByName(sColAreaName);
    var oEntrySet = oColArea.getEntriesInStep(sStepPath);
    forEachEntrySetElement(oEntrySet, oEntry)
    {
        if(null != oEntry)
        {
            var iEntryMergeState = oColArea.getEntryMergeState(oEntry, sStepPath);
            out.writeln("INFO:: Merge state for Entry PK ["+oEntry.getEntryAttrib(sAttribPath)+"] is: "+ iEntryMergeState);
            if( null != iEntryMergeState && iEntryMergeState == x)       //Replace x with number of feeder steps
            {
                hmResult = oColArea.moveEntryToNextStep(oEntry, sStepPath, sStepAction);
                if(null != hmResult && hmResult.size() > 0)
                {
                    out.writeln("ERROR:: Result of moveEntryToNextStep() call: "+hmResult);
                }
            }
        }    
    }
If you identify a stuck item, then you can use the following approach to identify the root cause:
  1. Query the CEH table to look for anomalous entries for the stuck item, for example, automated steps, which have a BEGINSTEP but no ENDSTEP. You can use the same SQL as mentioned earlier:
    SELECT * FROM CEH WHERE CEH_ENTRY_KEY = '<Primary Key of Item>' ORDER BY ceh_date DESC;
  2. If you do not find suspicious entries in the CEH audit trail, then note down the time stamp for events like BEGINSTEP event of merge step fire, ENDSTEP event for the feeder step and so on. They are printed as part of the preceding SQL.
  3. Analyze the logs (especially the workflow engine logs and custom logging) and search for errors or warnings during that time stamp.

    If you do not find a descriptive error message, then raise the logging level to debug and repeat the preceding process. Refer to the knowledge center on how to increase the level of logging to debug.

Merge Type Workflow Step

Symptoms

A merge step ensures all of the incoming steps are completed for that entry and then creates a single merged item before forwarding it to the next workflow step. If x number of steps point to the merge step, then x copies of the entry must reach this merge step before this item can move to the next step.

Workflow steps can be broadly categorized into two main types: "User steps" and "Automated". User steps are ones where the user (or script) must go to the workflow step, make wanted modifications to an item/items, and then move the item/items to the next step by selecting an Exit value. Automated steps are ones where items move through them and on to the next one without any user interaction. The purpose of these types steps is to take a predefined action or do logical checks by using code in the IN and OUT function of these workflow steps and then move them to the next step.
Merge type step is a special type of automated step. During setup, you specify more than one entry point (multiple steps feeding entries to this step) and the step combines all these entries and outputs into consolidated form of the item.
Note: The Admin user can move items out of a merge step even when all of the inputs to the merge have not arrived.

A step is defined as merge step by setting the "Type" field to "Merge" in workflow step definition page. You can reach the workflow step definition page by using the following path: Data Model Manager > Workflows > Workflow Console > Select a workflow to open and click Add Step or open an existing step.

Split steps are not a separate step type. We define them by specifying multiple entries in the "Next Step" column for that step in the workflow definition screen.

These splits and merges are used for division of labor and to have users work on distinct workflow activities in parallel; thus increasing efficiency and reducing bottlenecks.

Resolving the problem

An item appears in the merge step when any of the feeder steps complete processing. In the above example, as soon as any of the feeder steps complete (namely Split11, Split21, Split31, and Split41), an item appears in the "Sample Merge" step. But the entry in step "Sample Merge" will wait for the rest of the feeder steps to complete processing and once all of them finish, the item moves through to the next step. These feeder steps can be of any type: user steps or automated. At any time while the merge step is waiting for input, a user can manually (or via script) open the item in the merge step and move the entry through to the next step. The item moves through fine and there will not be any data inconsistencies introduced. But the changes, which have been made in other feeder steps, which have not yet completed processing will be lost.

As the item is waiting, it maintains an internal count of how many of the feeder steps have completed processing. We can get this information by querying the CAE_DATA column in the CAE table. For a merge step, this count is 1 when the first of the feeder steps complete processing. This count will increase by 1 every time another feeder step completes processing. These increments continue until all the feeder steps have completed; 4 in case of preceding example. When this count reaches four, the item will move through to the next step.

We can also get the state of the item (CAE_DATA value) programmatically by using the following function:

int CollaborationArea::getEntryMergeState(Entry entry, String stepPath)