Process summarizing and performance issues

Data displayed by the IBM® Sterling Control Center Monitor Web Console’s Recent file transfer activity dashboard widget, and the Completed Processes, Queued Processes, and the Completed File Transfer views, comes directly from the ICC summary database tables:
  • ROLL_UP
  • CC_PROCESS
  • CC_PROCESS_DVG
  • CC_FILE_TRANSFER and
  • CC_FILE_TRANSFER_DVG

The IBM Sterling Control Center Monitor ProcessSummaryService takes detailed data written to the IBM Sterling Control Center Monitor EVENTS, and other ancillary tables, and summarizes that information for insertion into the various IBM Sterling Control Center Monitor summary tables which the IBM Sterling Control Center Monitor web console queries to facilitate its displays and views.

To know how the ProcessSummaryService is performing you may look in the IBM Sterling Control Center Monitor engine log for the metrics it outputs once an hour.

Example Engine Log
****** Process Summary Metrics for the Past Hour ******
0   Events retrieved by ProcessSummaryWorker to do summarization = 13085, Max value = 431, Average = 24
0   Summarized processes for All server types = 1948
0   Summarized processes for B2B Integrator/Sterling File Gateway servers = 1942
0   Summarized processes for Sterling Connect:Direct servers = 6
0   Summarized transfers for All server types = 664
0   Summarized transfers for B2B Integrator/Sterling File Gateway servers = 64
0   Summarized transfers for Sterling Connect:Direct servers = 600
1   Milliseconds in ProcessSummaryWorker = 52468, Max value = 4633, Average = 97
1a  Milliseconds running query to get processes to be summarized = 2059, Max value = 1391, Average = 3
1b  Milliseconds performing EntityUtil.preLoad() = 1126, Max value = 520, Average = 2
1c  Milliseconds processing results of query to get processes to be summarized = 0, Max value = 0, Average = 0
1d  Milliseconds to getEvents for summarization = 37046, Max value = 4603, Average = 68
1d1 Milliseconds in getEvents to execute SQL to retrieve events for summarization = 36086, Max value = 4603, Average = 66
1d2 Milliseconds in getEvents to construct events from result set for summarization = 894, Max value = 141, Average = 1
1d3 Milliseconds in getEvents to get database connection used to retrieve events = 49, Max value = 16, Average = 0
1e  Milliseconds performing getEventsByProcess = 171, Max value = 140, Average = 0
1f  Milliseconds sorting the events prior to passing them to the summarizer = 32, Max value = 16, Average = 0
1g  Milliseconds summarizing All processes and transfers = 3956, Max value = 265, Average = 2
1g1 Milliseconds summarizing B2B Integrator/Sterling File Gateway processes and transfers = 3802, Max value = 265, Average = 1
1g1 Milliseconds summarizing Sterling Connect:Direct processes and transfers = 154, Max value = 31, Average = 25
1h  Milliseconds adding SQL to insert data into CC_FILE_TRANSFER = 238, Max value = 79, Average = 0
1i  Milliseconds committing database changes to CC_PROCESS and CC_FILE_TRANSFER = 7224, Max value = 1594, Average = 13
1j  Milliseconds to updateStatistics (Updates to ROLL_UP) = 475, Max value = 62, Average = 0
2   Milliseconds summarizing All transfers = 138, Max value = 16, Average = 0
2   Milliseconds summarizing B2B Integrator/Sterling File Gateway transfers = 0, Max value = 0, Average = 0
2   Milliseconds summarizing Sterling Connect:Direct transfers = 138, Max value = 16, Average = 0
2a  Milliseconds handling tag mapping for All transfers = 31, Max value = 16, Average = 0
2a1 Milliseconds handling tag mapping for B2B Integrator/Sterling File Gateway transfers = 0, Max value = 0, Average = 0
2a1 Milliseconds handling tag mapping for Sterling Connect:Direct transfers = 31, Max value = 16, Average = 0
2b  Milliseconds getting DVG names associated with transfers being summarized = 0, Max value = 0, Average = 0

The values displayed are ordered by the numbering on the left. When letters are used, for example, 1a, 1b, 1c, it’s typically done because those are a breakdown of the higher value, in this case 1. 1d1, 1d2, 1d3 are breakdown of the value for 1d.

Always pay attention to the time spent doing database I/O above. Specifically, the times for:
  • Milliseconds running query to get processes to be summarized
  • Milliseconds in getEvents to execute SQL to retrieve events
  • Milliseconds committing database changes to CC_PROCESS and CC_FILE_TRANSFER

The queries to get processes to be summarized run against the CC_PROCESS table. Long times here indicate a lack of maintenance being done to the IBM Sterling Control Center Monitor summary database tables and/or having so much data in them the database server performs poorly. Likewise, for the time spent committing database changes to CC_PROCESS and CC_FILE_TRANSFER.

The SQL used by getEvents to retrieve events runs against the EVENTS table. Long times here could indicate too many days of summary table data being retained. The process number limit for Connect:Direct servers is only 99,999, at which point process numbers begin at one again. When customers retain enough days of data that process numbers on monitored Connect:Direct servers repeat, it can cause long summary times, and increased memory usage, since when IBM Sterling Control Center Monitor retrieves all events associated with processes it is not able to distinguish between unique instances of them when their process names and numbers are identical, which results in IBM Sterling Control Center Monitor combining the events for multiple processes and longer than normal summary times.

Also compare the metric values for 1d1, Milliseconds in getEvents to execute SQL to 1d2, milliseconds in Milliseconds in getEvents to construct events. If 1d2 is larger than 1d1 it’s an indication that either the amount of memory, or lack thereof, allocated to IBM Sterling Control Center Monitor, or the processor/CPU for the server on which IBM Sterling Control Center Monitor is running, is limiting IBM Sterling Control Center Monitor performance.

TIP

The following queries may be run to ascertain the counts of processes to be summarized and the counts of processes that have been summarized.

Processes yet to be summarized

  • SELECT COUNT(*) FROM CC_PROCESS WHERE PROC_STATUS IN ('upgraded', 'ended', 'user completed ns')
  • SELECT STARTED_DAY, COUNT(*) FROM CC_PROCESS WHERE PROC_STATUS IN ('upgraded', 'ended', 'user completed ns') group by STARTED_DAY order by STARTED_DAY

Processes summarized

  • SELECT COUNT(*)FROM CC_PROCESS WHERE PROC_STATUS IN ('failed', 'success', 'user completed', 'errors')
  • SELECT STARTED_DAY, COUNT(*) FROM CC_PROCESS WHERE PROC_STATUS IN ('failed', 'success', 'user completed', 'errors') group by STARTED_DAY order by STARTED_DAY
There are engine properties that may be used to affect the performance of the ProcessSummaryService logic including:
  • PROCESS_SUMMARY_THREADS

    Default: 4

  • PROCS_TO_SUMMARIZE_AT_ONCE

    Default: 50. Max allowed: 255

  • MAX_PROCESSES_TO_SUMMARIZE_AT_ONCE

    Default: 5000. Available in v6.1.2.1 and above

In theory, the larger the value for PROCESS_SUMMARY_THREADS the better the ProcessSummaryService logic will perform but actual performance depends on the server IBM Sterling Control Center Monitor is running on and the amount of memory allocated to it. The larger the thread count, the higher the memory requirements will be. Likewise, for the number of processes summarized at once.

TIP

Of the two engine properties that may be used to affect performance of the ProcessSummaryService, try increasing the PROCESS_SUMMARY_THREADS value first.

MAX_PROCESSES_TO_SUMMARIZE_AT_ONCE limits the query used to find processes ready to be summarized at one time.