Summary
This paper describes in detail how IBM® FileNet® P8 5.1 was deployed and tuned in a virtualized environment on Linux® on System z®, on an IBM z196 model 2817-M66 with 16 CPs and 30 GiB central storage, connected to an IBM System Storage® DS8800 Model 951.
It shows how the setup of the whole stack, including typical System z features such as HyperPAV and z/VM® features as virtual networks, and various other Linux features and the FileNetP8 middleware are configured to create a high performing system.
The paper then describes the test workload, made of a realistic mix of the most common CE and PE transactions, and provides test results from scaling up this workload over a wide range of transaction rates, varying the number of system processors to evaluate how CPU sizing influences the performance.
The overall conclusion of the tests is that FileNet P8 5.1 shows excellent scalability on the Linux for System z system, with throughput increasing linearly until the CPUs of the CE/PE guest are nearly fully used. When scaling the CPUs, the throughput rate increase shows a nearly perfect scalability characteristics for the full workload bandwidth, which means when doubling the CPUs of the CE/PE guest a doubled workload can be reached. Overall we see a very nice symmetric multiprocessing (SMP) behaviour, which means that regardless how much virtual CPUs are configured for the CE/PE guest the used CPU capacity is the same. Having additional CPUs is not related with additional CPU cost due to increased management effort for the Linux kernel.

As illustrated in Figure 1, the normalized throughput achieved with 8 CE/PE CPUs was about 35x that of the lowest load tested with 1 CE/PE CPU, the base throughput normalized to "1".
The paper provides a full set of performance and scalability test results. In general the average response times become much better as more CPUs are available on the CE/PE guest. The best response times are achieved with 8 CPUs. In the workload range where the CPUs are limiting the throughput (usually greater than 80% CPU load), the response times starts to increase. Adding further CPUs to such a CPU bound system takes the response times down again and allows further throughput increase. Having in mind that more virtual CPUs on the CE/PE guest for the same workload level are not related with additional CPU load, but with shorter response time, it indicates that the system benefits from a higher degree of parallelism.
The important message derived from these results is that for these types of workload the response times for a wide workload level range can be easily controlled by the amount of available CPUs on the CE/PE guest. The amount of CPUs on the database system has a minor role for the tested workload, with two CPs being sufficient for all scenarios, and for the lower workload levels even just one CPU was sufficient.