Configuring notifications and actions for system maintenance

IBM® Business Automation Workflow systems can get overloaded if you don't tune your system and perform regular maintenance on artifact types that accumulate over time, such as process instances, named snapshots, unnamed snapshots, task instances, and durable messages. Without tuning or regular maintenance, systems eventually experience slowdowns and in extreme cases outages. To prevent such problems, the system logs warning messages in the SystemOut.log file by default when the number of instances of these artifact types exceeds the maximum allowed thresholds. If the system is in a critical state, it disables installation of snapshots until the issue is addressed. To increase or decrease this visibility, you can customize the notifications and actions according to your system's current tuning, sizing, usage, and requirements.

Before you begin

Before you customize the notifications, especially before you change the warning and critical thresholds, make sure that your system is sized and tuned to perform as expected with the increased number of artifacts. It is recommended to perform load tests in your pre-production environment to verify that the current sizing and tuning can cope with the increased number of artifacts, or to define needed tuning actions. For information about tuning multiple areas of your system, see Tuning and Tuning IBM Business Automation Workflow.

The following areas are important for tuning considerations:

After you have made sure your system can cope with a higher number of artifacts, you can override the default values for the notification and action settings. For that, you can use the updateBPMConfig command to add or modify the settings in the 100Custom.xml file.

About this task

You can use the following settings to control the artifact types that you want to monitor, when to log messages, and the actions to take when the amount of data exceeds the thresholds. The settings are enabled by default. To disable them, set enabled to false.

system-maintenance-monitor
If you want to enable the monitor, ensure that this value is true.
log-on-server-start
If you want the monitor to send status to the SystemOut.log file when the server starts, ensure that this value is true.
log-interval
Set the frequency at which the monitor logs information. The value is specified in minutes. The default value is 1440 minutes (24 hours).
monitor
Specify the artifact type that you want to monitor and its attributes:
  • type: The artifact type that you want to monitor
  • servertype (optional): For named and unnamed snapshots, specify the server location where you want to monitor the artifact type. For a workflow server, set the value to PS. For a workflow center, set the value to PC. To specify both locations, ensure that the value is set to ALL, which is the default value.
The predefined monitors are NAMED_SNAPSHOTS, UNNAMED_SNAPSHOTS, PROCESS_INSTANCES, TASK_INSTANCES, and DURABLE_MESSAGES.
critical-threshold
Specify the maximum number of instances of an artifact type that are allowed to accumulate before the system enters into a critical state and requires immediate maintenance. When the number of instances exceeds the critical threshold, the monitor notifies you with SystemOut.log file messages at the specified interval. It also disables snapshot installation by default when the number of named snapshots exceeds the critical threshold. The default critical thresholds for the predefined properties are as follows:
  • NAMED_SNAPSHOTS on a workflow server: 125
  • NAMED_SNAPSHOTS on a workflow center: 500
  • UNNAMED_SNAPSHOTS: 10000
  • PROCESS_INSTANCES: 600000
  • TASK_INSTANCES: 2400000
  • DURABLE_MESSAGES: 8000000
warning-threshold
Specify the number of instances of an artifact type that are allowed to accumulate before you need to consider performing system maintenance. When the number of instances exceeds the warning threshold, the monitor notifies you with SystemOut.log file messages at the specified interval. The default warning thresholds for the predefined properties are as follows:
  • NAMED_SNAPSHOTS on a workflow server: 100
  • NAMED_SNAPSHOTS on a workflow center: 400
  • UNNAMED_SNAPSHOTS: 5000
  • PROCESS_INSTANCES: 400000
  • TASK_INSTANCES: 1600000
  • DURABLE_MESSAGES: 5000000
prevent-lifecycle-action
To disable installing or importing of snapshots when the number of instances of an artifact type exceeds the critical threshold, add this element in the 100Custom.xml file, and specify INSTALL as the value. INSTALL is the only valid value for this element.

On a workflow server, specifying the INSTALL value disables snapshot installation. INSTALL is the default value for the NAMED_SNAPSHOTS monitor. To enable installation, replace the monitor element in the 100Custom.xml file, and omit this setting.

On a workflow center, specifying the INSTALL value disables importing of snapshots.
show-message-on-lifecycle-action
When the number of instances of an artifact type exceeds the warning threshold, the monitor displays a warning each time a snapshot is installed on a workflow server or imported on a workflow center. INSTALL is the default and only valid value. To disable the message, replace the monitor element in the 100Custom.xml file, and omit this setting.
If you want to change the values for these settings, you can override the values by adding or updating the settings in your 100Custom.xml files. For example, to add the settings to a 100Custom.xml file, you would add the following elements under the <properties> element and modify the values as needed:
<server>
   <system-maintenance-monitor merge="replace" enabled=“true">
     <log-on-server-start merge="replace">true</log-on-server-start> 
     <log-interval merge="replace">1400</log-interval>

     <monitor type=“NAMED_SNAPSHOTS” merge="replace" enabled=“true” servertype=“PS >
          <critical-threshold>125</critical-threshold>
          <warning-threshold>100</warning-threshold>
          <prevent-lifecycle-action>INSTALL</prevent-lifecycle-action>
     </monitor>

     <monitor type=“NAMED_SNAPSHOTS” merge="replace" enabled=“true” servertype=“PC >
          <critical-threshold>500</critical-threshold>
          <warning-threshold>400</warning-threshold>
          <prevent-lifecycle-action>INSTALL</prevent-lifecycle-action>
     </monitor>     

     <monitor type=“UNNAMED_SNAPSHOTS” merge="replace" enabled=“true” servertype=“PC”>
          <critical-threshold>10000</critical-threshold>
          <warning-threshold>5000</warning-threshold>
         <show-message-on-lifecycle-action>INSTALL</show-message-on-lifecycle-action>
     </monitor>

     <monitor type=“PROCESS_INSTANCES” merge="replace" enabled=“true”>
          <critical-threshold>600000</critical-threshold>
          <warning-threshold>400000</warning-threshold>
          <show-message-on-lifecycle-action>INSTALL</show-message-on-lifecycle-action>     
     </monitor>

     <monitor type=“TASK_INSTANCES” merge="replace" enabled=“true”>
          <critical-threshold>2400000</critical-threshold>
          <warning-threshold>1600000</warning-threshold>
           <show-message-on-lifecycle-action>INSTALL</show-message-on-lifecycle-action>
     </monitor>

      <monitor type=“DURABLE_MESSAGES” merge="replace" enabled=“true”>
          <critical-threshold>8000000</critical-threshold>
          <warning-threshold>5000000</warning-threshold>
          <show-message-on-lifecycle-action>INSTALL</show-message-on-lifecycle-action>
      </monitor>
   </system-maintenance-monitor>
<server>
Note: If no action is defined for a specific monitor, no action occurs. Either <prevent-lifecycle-action> or <show-message-on-lifecycle-action> must be present under a specific monitor.

If thresholds aren't defined, no action will occur because it uses the thresholds to initiate its action. <critical-threshold>, <warning-threshold>, or both settings must be defined for the corresponding action to work.

For information about changing the 100Custom.xml files, see The 100Custom.xml file and configuration. For information about the individual 100Custom.xml files that must be updated and their locations, see Location of 100Custom configuration files.

However, to consistently and reliably change the value of the two settings in all of the 100Custom.xml files in your Business Automation Workflow deployment environment, it is recommended that you use the updateBPMConfig command as described in the following procedure:

Procedure

  1. Stop the servers for Workflow Server and Workflow Center.
  2. Start the scripting client in disconnected mode as described in the topic updateBPMConfig command.
  3. For each property, run the following commands to simultaneously update all affected servers:
    wsadmin> AdminTask.updateBPMConfig( [ '-create', '/server' ] )
    wsadmin> AdminTask.updateBPMConfig( [ '-create', '/server/system-maintenance-monitor' ] )
    wsadmin> AdminTask.updateBPMConfig( [ '-create', '/server/system-maintenance-monitor/@merge', '-xNodeValue', 'replace' ] )  
    wsadmin> AdminTask.updateBPMConfig( [ '-create', '/server/system-maintenance-monitor/@enabled', '-xNodeValue', 'true_or_false' ] )
    wsadmin> AdminTask.updateBPMConfig( [ '-create', '/server/system-maintenance-monitor/log-on-server-start', '-xNodeValue', 'true_or_false' ] )
    wsadmin> AdminTask.updateBPMConfig( [ '-create', '/server/system-maintenance-monitor/log-interval', '-xNodeValue', 'interval_value' ] )
    wsadmin> AdminTask.updateBPMConfig( [ '-create', '/server/system-maintenance-monitor/monitor' ] )
    wsadmin> AdminTask.updateBPMConfig( [ '-create', '/server/system-maintenance-monitor/monitor/@merge', '-xNodeValue', 'replace' ] )
    wsadmin> AdminTask.updateBPMConfig( [ '-create', '/server/system-maintenance-monitor/monitor/@type', '-xNodeValue', 'artifact_type' ] )
    wsadmin> AdminTask.updateBPMConfig( [ '-create', '/server/system-maintenance-monitor/monitor/@enabled', '-xNodeValue', 'true_or_false' ] )
    wsadmin> AdminTask.updateBPMConfig( [ '-create', '/server/system-maintenance-monitor/monitor/@servertype', '-xNodeValue', 'server_type' ] )
    wsadmin> AdminTask.updateBPMConfig( [ '-create', '/server/system-maintenance-monitor/monitor/critical-threshold', '-xNodeValue', 'critical_threshold_value' ] )
    wsadmin> AdminTask.updateBPMConfig( [ '-create', '/server/system-maintenance-monitor/monitor/warning-threshold', '-xNodeValue', 'warning_threshold_value' ] )
    wsadmin> AdminTask.updateBPMConfig( [ '-create', '/server/system-maintenance-monitor/monitor/prevent-lifecycle-action', '-xNodeValue', 'application_action' ] )
    wsadmin> AdminTask.updateBPMConfig( [ '-create', '/server/system-maintenance-monitor/monitor/show-message-on-lifecycle-action', '-xNodeValue', 'application_action' ] )
    wsadmin> AdminConfig.save()
    Replace the true_or_false variable with either true or false and other variables such as interval_value, artifact_type, server_type, critical_threshold_value, warning_threshold_value, and application_action with their respective values.
    Note: The previous commands assume the properties are not defined in the 100Custom.xml file yet. If they are available already, choose option -update instead of -create for these updateBPMConfig commands.
  4. Restart the servers.

Results

The recommended way of updating the 100Custom.xml files is by running the updateBPMConfig command. However, if the updates are unsuccessful, you can manually update the files by following the steps in the topic Creating a 100Custom.xml configuration file.