Dedicated server versus shared server processes

This test shows the impact on transaction throughput when changing between Oracle dedicated server processes and Oracle shared server processes.

With dedicated server processes, the CPU load increased to the full system utilization. Therefore, another test was done with an additional CPU for each Oracle RAC node. The Oracle RAC dedicated server and shared server processes are described in the section.

Figure 1 shows the transaction throughput when switching from shared to dedicated Oracle server processes.
Figure 1. CPU load for switching the usage of the Oracle server processes from shared to dedicated
Bar graph of the normalized transaction throughput when switching from shared to dedicated Oracle server processes. The x-axis has three bars: (1) Shared processes with 2 CPUs, (2) Dedicated processes with 2 CPUs, (3) Dedicated processes with 3 CPUs. The y-axis is the Normalized transaction throughput, ranging from 0% to 350%. For the three bars, the values are: 100%, 225%, and 310% respectively.

Observation

By changing the Oracle RAC server processes to dedicated from shared, transaction throughput increased by 125%. Adding a third processor to each CPU increased transaction throughput another 86% compared to the measurement of the shared server processes with 2 CPUs.

Conclusion

This test scenario demonstrates that higher throughput can be obtained with dedicated server processes configured for the Oracle RAC servers compared with shared server processes. With this change, the CPU load increased to the full system utilization. Adding CPU capacity in both Oracle RAC servers improves the throughput with dedicated server processes further.

This effect is contributed to our workload with a moderate number of clients (up to 80), where each client is 100% active, generating a continuous workload, which easily keeps one Oracle server process busy. If several of these workload generating clients are sharing a server process, this server process becomes a bottleneck.

For this type of workload pattern, the use of dedicated server processes is recommended. This workload pattern is typically produced by a batch job, an application server, or a connection concentrator.

The opposite scenario occurs with connections that have a low utilization, such as from any manual interactions, and with a very high number (1000 or more) of users. Here, the use of dedicated server processes results in thousands of under-utilized server processes, with the corresponding memory requirements.

Therefore, it is important to see how memory usage scales, when the number of users with dedicated server processes increases. Figure 2 shows the sum of the amount of memory used from both nodes, as reported from the kbmemused value from the sadc and sar data.
Note: If the SERVER parameter is not set, then shared server configuration is assumed. However, the client uses a dedicated server if no dispatchers are available.
Figure 2. Memory usage when scaling workload generating users (100% active) with dedicated Oracle server processes
Bar graph of the amount of memory used from both nodes, as reported from the kbmemused value from the sadc and sar data. The x-axis has four bars, representing the number of users: 20, 40, 60, and 80. The y-axis is the total memory used in MB, ranging from 0 to 18000 MB. For the four bars, the values are: 14200 MB, 15800 MB, 16100 MB, and 16100 MB respectively.

Observation

With the first increment going to 40 users from 20, the memory usage increased from approximately 14 GB to 15.6 GB (approximately 11%). The next increment of 20 users caused a slight increase of overall memory usage, and with the next increment the used memory size stays constant. Each Oracle RAC node has 16 GB memory, which makes a total of 32 GB from which about 16 GB are in use, indicating that plenty of free memory is available in both systems.

Conclusion

This chart shows that after overall memory usage increases to accommodate 40 users, the total usage levels off. One parameter that limits the increase of the memory utilization is the fixed size of the SGA, which seems to be fully utilized with 60 users. Another factor that might explain a decreasing memory requirement is increasing contention for locks. The amount of memory needed for additional server processes is not necessarily limited by these parameters because it is in the first place Linux® operating system memory, not managed by Oracle.

It seems that the operating system memory needed for these processes soon becomes constant because of optimizations from the operating system. An example of this Linux optimization is the memory used for the binaries. This memory is used only one time, regardless of the number of copies. Only data structures, unique to each process, occupy individual pages. The memory related to the user data is kept in the SGA (which has a limited size). All these factors explain why the amount of memory used does not increase indefinitely.

Figure 3 shows how the two topmost wait events are developing when the number of users is scaled.
Figure 3. Cluster wait time and lock contention when scaling workload generating users (100% active) with dedicated Oracle server processes
Graph of the two topmost wait events, when the user of users is scaled. The x-axis has a number of users, with values of 20, 40, 60, and 80. The y-axis has the wait time in seconds, ranging from 0 to 60000 seconds. There are two lines: (1) Cluster wait time, and (2) Row lock contention. For (1), the values are 5000 seconds, 11000 seconds, 20000 seconds, and 22000 seconds for each of the four values for the number of users (20, 40, 60, 80). For (2), the values are 0 seconds, 12000 seconds, 32000 seconds, and 51000 seconds for each of the four values for the number of users (20, 40, 60, 80).

Observation

This graph shows two types of contention taking place on the clustered system as users were scaled from 20 to 80. The cluster wait time (indicating lock contention between the nodes) trends up from with 20 users and the slope decelerates and begins to flatten with 60 users. The lock contention, which is related to lock contention inside one node, starts at nearly 0 and increases much faster than the cluster contention. The break-even point is at 40 users.

Conclusion

The behavior of the system is determined by two types of lock contentions, either globally between the two nodes or locally inside each node. The local lock contention becomes the dominant factor when the number of users is increasing. The lock contention is a bottleneck that must be resolved if user activity is to be increased. In this case, the cause of lock contention is updates made to a small shared table. Lock contention occurs either when the nodes in the cluster, or two users inside the nodes, are trying to update the same data. The local contention is based on row locking, while the cluster contention locks the whole data block.

The graph shows also that the typical measurement point at 40 users demonstrates a certain level of cluster contention (which makes it worthwhile to consider network tuning of the interconnect). At the same level there is a contention on local locks, which keeps the workload realistic and means that there is not a 100% dependency from the cluster performance.