Summary

To ensure that variations in the disk setup do impact the database transactional throughput, we verified that the system performance depended as much as possible on disk I/O performance.

This was achieved by reducing database cache as much as possible to minimize the effect of caching hiding differences in the I/O bandwidth. A SGA size of 4 GB was found as appropriate for our workload.

The first basic setup tests with Oracle 10g showed that placing the logs into a separate disk group with two additional disks heavily reduces the wait time for log events!

When the number of users was scaled the transactional throughput scaled much lower than the amount of users, which confirms that the workload is waiting for disk I/O. The CPU load and the transactional throughput scaled at the same rate, indicating that there is no CPU overhead related to the workload in the used range. The disk transfer rates of 200 MiB/sec could be considered as relatively high for a database workload. This shows clearly that the test system is suitable for testing variations in storage connectivity setup, especially in combination with the high role of the wait event class USER I/O. It also shows that the used storage server provides a relatively high disk I/O bandwidth and is not a bottleneck.

The comparison of Oracle 11g versus Oracle 10g showed a very impressive improvement of approximately a factor of four for a transactional workload. The reduction in the disk I/O read rate indicates that the caching is improved. The database now finds more data in the caches and needs to read less data from disk. The increase in write rates roughly follows the throughput increase. This confirms that our transactional throughput has really improved that much. Additionally, the CPU effort per transaction was also lower, which highly recommends the update to Oracle 11g, at least for transactional workloads.

The usage of HyperPAV devices improved the transactional and the disk I/O throughput by 40%, even for the dramatically reduced disk I/O rate coming up with the switch to Oracle 11g. The expectation is that the impact of HyperPAV devices increases with increasing disk I/O requirements. The amount of required HyperPAV devices was for this workload with 20 devices relatively small. Another important finding is that too many HyperPAV devices lead to a degradation, even if it is moderate. Here monitoring the response times of the storage server (for example with the DASD statistics) can help to decide what is a good number. From the administrative perspective the big advantage of HyperPAV with respect to PAV is the simple handling, it just needed to be enabled in Linux, the Linux kernel then handles multipathing and workload balancing.

When using FCP devices the workload was able to saturate the 4 IFLs, leading to effect that the throughput was limited by the CPU capacity. Adding two further IFLs provides a higher throughput but a CPUs utilization still up to 90%, which might still limit the transactional throughput. Adding even more CPUs provides only a slight improvement for this workload. With FCP disks the multipath daemon is required to implement multipathing. A setup which uses the multibus policy and switches between the paths after each 100 requests (rr_min_io parameter) provided the highest disk I/O bandwidth in our scenario. This value for rr_min_io might be specific to this type of workload. In terms of disk I/O pattern this means a throughput up to 200MB/sec with small request sizes (8KB Oracle block size), which is remarkable, but not too high. The expectation is that higher throughput rates, especially with larger block sizes, require smaller even rr_min_io values, which means sending less requests to the same path and switching earlier to another path.