Once you optimize the performance at low volumes you can then increase
scale & volumes to validate that your application continues to
perform to your requirements.
To successfully scale your deployment you need to understand the
business process and the design so that you understand where your
implementation is concurrent & scalable and where it is not or
under what conditions it is scalable. For example, unless you are
using fragmentation, mapping (inbound or outbound) a large file with
many transactions is a serial process.
You also need to consider which message flows do which work, which
can be sensibly scaled and which can't. To take the sample application
as an example the Heartbeat flow should not be scaled since it only
raises 1 event per elapsed time interval, additional threads will
not provide benefit. The LogErrorFlow is used to log errors, since
these should not normally occur and when they do fast processing may
not be essential, scaling this flow does not make much sense.
The work breakdown information collected from FTM level 1 trace
for a single instance deployment should be used as input for any scaling
activity. From this information you should be able to get an approximate
picture of how much of the time is spent in which message flows which
directly infers how much cpu resource should be required for each
flow. In the context of the sample application, approximately 30%
of the elapsed time is spent in the physical transmission wrappers
and approximately 70% in the event processing flow implying the event
processing flow needs more than twice the CPU time. It is important
to note that in the sample application the physical transmission flow
has 3 message processing input queues used during normal message processing
which implies 3 threads of execution. When we therefore consider a
suitable scaling factor between event processing and physical transmissions
we should allocate at least 6 threads for event processing. In a typical
FTM deployment it is usual to have more thread instances deployed
for event processing than transmission processing. A potentially suitable
deployment model is shown below where we have separated the deployment
into multiple execution groups and added additional thread instances
to the event processing flow:
Figure 1. Multiple instance deployment
example
Performance of this deployment configuration should then be
evaluated using the same approach as used for the single instance
case. If the application is scaling well then you would expect to
see the above deployment runs 6 times faster (TPS) than the single
instance usecase. If the performance of the application has not scaled
as expected then you need to first check that you are not maxing out
any system resources such as CPU, disk IO, memory, database locks
etc and then review the FTM level 1 trace and database analysis (db2top).
The objective here is to compare the work breakdown costs in the scaled
version against the costs in the single instance to look for work
items whose performance is degrading as the application scales. If
problem areas are identified the artifacts in question need to be
analyzed to understand the problem so that they may be resolved. It
is also worth monitoring queue depths used by the application to see
if any queues stand out as bottlenecks in the process.