IBM Support

Importance of dataSweeper in IBM Sterling B2B Integrator

Technical Blog Post


Abstract

Importance of dataSweeper in IBM Sterling B2B Integrator

Body

This blog gives information about what is a dataSweeper and how the dataSweeper utility used in IBM Sterling B2B Integrator (SBI) to clean up orphan records. Before using the dataSweeper it is important to understand how it works. This blog helps with that.

What is dataSweeper:
From time to time Sterling B2B Integrator (SBI) customers are running into following issues. To address these issues engineering provided the dataSweeper tool. The DataSweeper can be executed directly from command line or by schedules. Depends on operating system either dataSweeper.sh or dataSweeper.cmd will be provided. The dataSweeper command line or schedule provides more like a professional layer to correct these issues.
1. Database growth problems
- Documents not getting purged from SBI
- Workflows stuck at 10 year life spans
2. Database performance issues
- Removal of un-necessary correlation set records.
- Performance stats table.
3. Data integrity
- Invalid workflow context data
- Documents not in sync with workflow or correlation rows not in sync.
4. SBI stability
- Invalid workflow context data
- Documents not in sync with workflow or correlation rows not in sync.
Thus understanding and running the DataSweeper as needed will help with cleaning up orphan records.
1. DataSweeper Business Process Schedule:
- BP Name: Schedule_DataSweeper
- Runs on schedule
- Runs every Monday 1AM by default.
The DataSweeper service identifies the orphan records and operates on batches. Below are the dataSweeper service parameter configured by default.
<operation name="Service">
<participant name="DataSweeper"/>
<output message="Xout">
<assign to="batchSize">5000</assign>
<assign to="autocorrect">TRUE</assign>
<assign to="maxIterations">1000</assign>
<assign to="sweeperTimeout">1080000</assign>
<assign to="sweeperTimeoutThreshold">36000000</assign>
<assign to="." from="*"></assign>
</output>
<input message="Xin">
<assign to="." from="*"></assign>
</input>
</operation>
Execution steps and status report:

image

image

The sweeper status shows "OK" for each sweeper when no orphans found. Otherwise it lists the orphan related details. There are more sweepers available but these are the
sweepers that is enabled out of box and can be run safe online.
There are few other important sweepers that are not enabled out of the box and those are ones that can be run only when SBI is down. Below are few important sweepers to run (not enabled out of the box) and really should be scheduled during database maintenance period/outage.
-missingArchiveInfoSweeper
-missingArchiveInfoSweeper.DOCUMENT
-missingDocumentLifespansSweeper
-unassociatedRowSweeper
-unassociatedRowSweeper.WF_INACTIVE
2. The data Sweeper command line option:
These sweepers can be run using the command line options. There is a dataSweeper script available in the <SBI_INSTALL>/bin directory. Refer the usage details below by running the dataSweeper script with -help option.
image

The user can run the specific sweeper needed just by running the command below or just with the -detailedReport option to get report on all outofbox sweepers:
1. Running specific sweeper:

image

2. Running all outofbox enabled sweepers by command line:

image

Note: IBM support always recommends running the sweeper with -detailedReport option, review the report THEN run with -autoCorrect option. Do NOT use -autoCorrect option without getting detailedReport and reviewing them with support.
If you running the outofbox dataSweeper schedule/command line and the dataSweeper running for a long time and if you would like to terminate you can terminate it. This happens at times when there huge amount of orphans and database performance is not good. There are no issue terminating it manually as the DataSweeper will finish up whatever batch commit it is working on at the time and it will then terminate.
Also depends on the issue reported, IBM support would request for the dataSweeper report with DEBUG mode enabled to troubleshoot any issues with outofbox sweepers that runs by schedule. To enable DEBUG mode add the assign statement below by editing the Schedule_DataSweeper BP.
<operation name="Service">
<participant name="DataSweeper"/>
<output message="Xout">
<assign to="batchSize">5000</assign>
<assign to="autocorrect">TRUE</assign>
<assign to="debugMode">TRUE</assign>
<assign to="maxIterations">1000</assign>
<assign to="sweeperTimeout">1080000</assign>
<assign to="sweeperTimeoutThreshold">36000000</assign>
<assign to="." from="*"></assign>
</output>
<input message="Xin">
<assign to="." from="*"></assign>
</input>
</operation
The DEBUG report would help in finding out the driver used (SELECT, INSERT, UPDATE, DELETE) to identify the orphan records to clean up those. If there are any queries then please post your questions here or open a PMR with IBM support.

[{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SS3JSW","label":"IBM Sterling B2B Integrator"},"Component":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"","Edition":"","Line of Business":{"code":"LOB59","label":"Sustainability Software"}}]

UID

ibm11121889