IBM Support

What is purpose of BPrecovery?Example for Auto Resume.

Technical Blog Post


Abstract

What is purpose of BPrecovery?Example for Auto Resume.

Body

Purpose of BPrecovery :

When SterlingB2B instance fails abnormally (JVM crashes or is killed via hardstop.sh), the WorkFlowEngine (WFE) doesn't have an opportunity to synchronize the database. Therefore Business Processes that are in an ACTIVE, HALTING or WAITING_ON_IO state will remain that way indefinitely (referred to as Active Hung processes), and the UI will not offer any actions to repair them (since operating on an in-flight BP is not safe). The BPRecover attempts to address the problem of how to synchronize the database and the WFE so as to not impact an newly executing BPs.

Actions of BP Recovery

The process for detecting the set of Active Hung process (candidates), is the BPReportService obtains the list of ACTIVE, HALTING or WAITING_ON_IO from the database. This set is then compared to the list of threads, messages in the queues and ActivityData entries (objects in memory that can be associated with an in-flight process). This is done 3 times with a 10 sec sleep in between each. If a canidate makes it through all 3 checks, then it is considered active hung. Next the recovery process collects and adds the set of BPs that we stopped due to a normal (soft) shutdown and this is the final list of candidates.



The candidates are then passed through the BPStateFilerService, which sorts them into 4 list by recovery level from the process's WorkFkow Definition. The list of possible recovery level will be INTERRUPT_MANUAL, AUTO_RESUME, AUTO_RESTART and AUTO_TERMINATE.



The INTERRUPT_MANUAL list will be halted and a user will need to inspect the process and take some action (restart, resume or terminate) manually.

The AUTO_RESUME list will have re-execute processing, repeating the step after the last fully persisted row in the database.

The AUTO_RESTART list each process will be terminate and the data from the initial step will be reused in a new instance of the same WFD (back at step 0).

The AUTO_TERMINATE list each process is Terminated and will no longer be executable.



The 4 lists of candidates will move the BPMarkService, where INTERRUPTED_MANAUAL will get an Interrupt mark. The AUTO_RESUME will get an Interrupt Auto mark. The AUTO_RESTART and AUTO_TERMINATE will receive Terminate marks. A mark is a placeholder row in the data base to tell GIS tools that the state of the BP has changed. The AUTO_RESUME and AUTO_RESTART lists will be passed to the BPStartService will resume or restart the BP in the lists one at a time until a maximum number (accumulatively) has been added back to the system. The BPStart throttle is defined by the maxAutorecoveryCount property (the default is 10). If more BPs exists in the list than BPStartService can resume or restart the will be left for the next cycle of the process.



Finally, the MessageRecoveryService checks the PRODUCED_MESSAGE table look for any unclaimed message and attempt to match the message to list of registered consumers. This is just a second chance to bootstrap the standard Produce/Consume finder system processes.

A global lock is requested via the LockService to ensure that only one copy of the BPRecovery service is active a time. The BPRecover is implemented by the system process Recover.bpml process that is schedule to run every 45 minutes by the Schedule_BPRecovery schedule.

Important Services Involved in BpRecovery

BPReportService

BPStateFilter

BPMarkService

BPStartService

LostMessageService

Example

1. We have taken two node Sterling B2B Integrator cluster in labs to test the scenario

2. Create a sample Business Process and set recovery level and softstop recovery level to "Auto Resume" .

3. Example Business Prcoess we took to test the Auto Resume recovery setting



<process name="swaroopa_bprecoverytest.bpml">

<sequence name="main">

<operation name="1">

<participant name="TestSleepService"/>

<output message="Xout">

<assign to="SLEEP_INTERVAL">60000</assign>

<assign to="." from="*"></assign>

</output>

<input message="Xin">

<assign to="." from="*"></assign>

</input>

</operation>

</sequence>

</process>



4. Run Business Process (swaroopa_bprecoverytest.bpml) on node 1 and shutdown node 1 while it is still executing

image

5. Login to node 2 and verify that token node is switched to node 2

6. Login to node2 and verify swaroopa_bprecoverytest Business Process is still in active state and execution node shows as node1. At this point the Business Process is really not active but shows as active since we issued hardstop while it was running on node 1 and the same state persisted into the database.



7. The following actions happens when the BP Recovery schedule runs next time on node 2

image

8. First the BP report service identifies swaroopa_bprecoverytest it as active hung. Process Data show below

image

9. BP state filter service finds swaroopa_bprecoverytest is set to auto resume based on the recovery level we set for the BP.

image

10. Then the BP Mark service marked it as interrrupted_Auto.

image

11. Then BP start service starts (resumes) the business process and completes it.

imageReviewed : Vince Tkac

[{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SS3JSW","label":"IBM Sterling B2B Integrator"},"Component":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"","Edition":"","Line of Business":{"code":"LOB59","label":"Sustainability Software"}}]

UID

ibm11121775