Scaling

Once you optimize the performance at low volumes you can then increase scale & volumes to validate that your application continues to perform to your requirements.

To successfully scale your deployment you need to understand the business process and the design so that you understand where your implementation is concurrent & scalable and where it is not or under what conditions it is scalable. For example, unless you are using fragmentation, mapping (inbound or outbound) a large file with many transactions is a serial process.

You also need to consider which message flows do which work, which can be sensibly scaled and which can't. To take the sample application as an example the Heartbeat flow should not be scaled since it only raises 1 event per elapsed time interval, additional threads will not provide benefit. The LogErrorFlow is used to log errors, since these should not normally occur and when they do fast processing may not be essential, scaling this flow does not make much sense.

The work breakdown information collected from FTM level 1 trace for a single instance deployment should be used as input for any scaling activity. From this information you should be able to get an approximate picture of how much of the time is spent in which message flows which directly infers how much cpu resource should be required for each flow. In the context of the sample application, approximately 30% of the elapsed time is spent in the physical transmission wrappers and approximately 70% in the event processing flow implying the event processing flow needs more than twice the CPU time. It is important to note that in the sample application the physical transmission flow has 3 message processing input queues used during normal message processing which implies 3 threads of execution. When we therefore consider a suitable scaling factor between event processing and physical transmissions we should allocate at least 6 threads for event processing. In a typical FTM deployment it is usual to have more thread instances deployed for event processing than transmission processing. A potentially suitable deployment model is shown below where we have separated the deployment into multiple execution groups and added additional thread instances to the event processing flow:

Figure 1. Multiple instance deployment example

Performance of this deployment configuration should then be evaluated using the same approach as used for the single instance case. If the application is scaling well then you would expect to see the above deployment runs 6 times faster (TPS) than the single instance usecase. If the performance of the application has not scaled as expected then you need to first check that you are not maxing out any system resources such as CPU, disk IO, memory, database locks etc and then review the FTM level 1 trace and database analysis (db2top). The objective here is to compare the work breakdown costs in the scaled version against the costs in the single instance to look for work items whose performance is degrading as the application scales. If problem areas are identified the artifacts in question need to be analyzed to understand the problem so that they may be resolved. It is also worth monitoring queue depths used by the application to see if any queues stand out as bottlenecks in the process.