IBM System p™ virtualization features supported by the SUSE Linux Enterprise Server (SLES) 10 operating system include virtual SCSI (VSCSI) and virtual LAN (VLAN). Both VSCSI and VLAN provide configuration and tuning parameters that can improve system performance. This article highlights some tuning recommendations and identifies measurement tool features that provide data for performance monitoring and diagnosis of virtualization performance problems. For information about initial setup and configuration see Resources.
The primary way to tune VSCSI is to select the appropriate I/O scheduler. You can choose the I/O scheduler for either or both the VSCSI server drive and the VSCSI client drive. The selected scheduler should be appropriate for the workload on the drive. The installation default is the anticipatory I/O scheduler. The best I/O scheduler for the VSCSI server drive is the noop scheduler.
The I/O scheduler choices are:
- noop - fifo queuing
- anticipatory - anticipatory scheduling
- deadline
- cfq - consistently fair queuing
To find out which I/O scheduler is being used, query the
/sys file system. For example, on SLES 10, use the
following:
cat /sys/block/<sd*>/queue/scheduler
^use drive of interest
For example using drive sda:
cat /sys/block/sda/queue/scheduler
[noop] anticipatory deadline cfq
|
In this example, [noop] is the scheduler being used.
You can change the I/O scheduler in real time by echoing a value in the file
shown in the above example. You can also change the scheduler at boot time by putting a line
into the /etc/yaboot.conf file, such as the following:
append = "elevator=noop" |
The generally available version of SLES 10 has a VSCSI functional bug. Get the latest kernel update for SLES 10 (2.6.16.21-025-ppc or newer) from the SUSE Linux portal (see Resources). This update includes the bug fix.
The iostat command is part of the sysstat package,
available through the SUSE utility tool called yast.
- Start yast.
- Select
software. - Select
software management. - Search for
sysstat. - Highlight the systat package.
- Choose the accept action.
- Follow the instructions for which installation disk to load.
Running iostat -x provides a great deal information
about the read and write traffic to the physical and virtual devices. See Resources to get to the manual pages to see the iostat column definitions.
In order to diagnose a performance problem involving the virtual SCSI disk, you need to understand the VSCSI system configuration. You need to know how virtual devices map to physical hardware.
The sum of the activity of all disk partitions on a physical SCSI disk must be within the capability of the physical SCSI device. First, look at the demand on the physical device. Then, if the demand is too high, look at the demand from each of its partitions. Adjust the virtual-to-physical mapping as needed for the demand to be within the capability of the physical hardware. This adjustment can require changing the virtual-to-physical mapping or changing or adding hardware.
Consider an example of the VSCSI server having a SCSI drive named
sdc that is divided into three partitions:
(sdc1, sdc2, and
sdc3), each of which is virtualized. The virtualized
partitions each appear as a separate drive when used by a client. The
iostat tool shows utilization of a drive. iostat can measure the usage
of partitions sdc1, sdc2,
and sdc3 as drives on the clients. The best approach
is to run iostat first on the VSCSI server to show the
total utilization of sdc. Then run
iostat on each client to get the utilization of the
partitions sdc1, sdc2, and
sdc3.
Figure 1. Virtual SCSI example
Understanding a measurement example
Take a look at a measurement example that shows why you need to understand
how the physical drives map to the virtual disks. As shown in Figure 1, the VIO
server boots from sda. The VIO server then virtualizes
sdb and the three partitions on
sdc. The VIO client boots from
sda then mounts the three virtualized partitions as
sdb, sdc, and
sdd.
The workload for this example is provided by Flexible File System Benchmark
(FFSB) (see
Resources for where to find the tool), which is
an open source tool that can easily be configured to provide a variety of read,
write, sequential, and random patterns with additional threading options. For this
example, FFSB is configured to evaluate the performance of the client disks
sdb, sdc, and
sdd. Large sequential reads are done on
sdb. Small random reads are done on
sdc. Sequential writes are done on
sdd, as shown in Listing 1. The
iostat tool measures the resulting behavior.
The benchmarks are first run on each disk separately, then the benchmark is run on
all three drives concurrently.
Listing 1 shows the output of iostat for large
sequential reads on sdb.
Listing 1. iostat output for large sequential reads
avg-cpu: %user %nice %system %iowait %steal %idle
0.30 0.00 6.35 93.30 0.00 0.00
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s
sda 0.00 0.45 0.10 0.10 0.80 4.80 0.40
sdb 1.17 0.80 173.36 0.40 82818.59 11.19 41409.30
sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdd 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Device: wkB/s avgrq-sz avgqu-sz await svctm %util
sda 2.40 28.00 0.00 5.00 5.00 0.10
sdb 5.60 476.68 102.99 592.19 5.75 100.00
sdc 0.00 0.00 0.00 0.00 0.00 0.00
sdd 0.00 0.00 0.00 0.00 0.00 0.00
|
Listing 2 shows the output of running iostat for small
random reads on sdc.
Listing 2. iostat output for small random reads
avg-cpu: %user %nice %system %iowait %steal %idle
0.05 0.00 0.95 99.00 0.00 0.05
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s
sda 0.00 0.45 0.00 0.10 0.00 4.80 0.00
sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdc 0.00 0.80 98.85 0.40 2390.80 11.19 1195.40
sdd 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Device: wkB/s avgrq-sz avgqu-sz await svctm %util
sda 2.40 48.00 0.00 0.00 0.00 0.00
sdb 0.00 0.00 0.00 0.00 0.00 0.00
sdc 5.60 8.03 31.56 105.50 3.34 100.00
sdd 0.00 0.00 0.00 0.00 0.00 0.00
|
Finally, Listing 3 shows results from iostat for
random writes on sdd.
Listing 3. iostat output for random writes
avg-cpu: %user %nice %system %iowait %steal %idle
0.00 0.00 1.00 98.95 0.00 0.00
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s
sda 0.00 0.45 0.10 0.55 0.80 8.40 0.40
sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdd 0.00 27.69 0.00 455.97 0.00 3866.87 0.00
Device: wkB/s avgrq-sz avgqu-sz await svctm %util
sda 4.20 14.15 0.01 15.38 4.62 0.30
sdb 0.00 0.00 0.00 0.00 0.00 0.00
sdc 0.00 0.00 0.00 0.00 0.00 0.00
sdd 1933.43 8.48 143.49 297.71 2.19 99.95
|
The client-side iostat of all three benchmarks
concurrently shows that the throughput of the individual drives when run
concurrently is less than that of when the drives are run independently. Listing 4
shows the concurrent iostat results for all three
benchmarks.
Listing 4. Concurrent results for all three benchmarks
avg-cpu: %user %nice %system %iowait %steal %idle
0.05 0.00 1.60 98.30 0.00 0.00
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s
sda 0.00 0.10 0.05 0.20 0.40 2.80 0.20
sdb 1.10 0.40 14.84 0.25 7084.86 7.60 3542.43
sdc 0.00 0.75 52.22 0.45 417.79 11.19 208.90
sdd 0.00 14.89 0.00 312.29 0.00 2618.29 0.00
Device: wkB/s avgrq-sz avgqu-sz await svctm %util
sda 1.40 12.80 0.00 8.00 6.00 0.15
sdb 3.80 469.93 91.77 4818.38 66.26 100.00
sdc 5.60 8.14 30.72 641.45 18.98 100.00
sdd 1309.15 8.38 142.79 451.13 3.20 100.00
|
Now look at the utilization of the physical disk on the server.
Recall from Figure 1 that the server's drive sdc has
three partitions that the client uses as drives sdb,
sdc, and sdd. The
server-side iostat measurements show that the utilization of
the physical disk drive sdc is 100% utilized, as shown
in Listing 5.
Listing 5. Server-side iostat measurements showing physical disk drive utilization
avg-cpu: %user %nice %system %iowait %steal %idle
0.00 0.00 1.40 0.00 0.00 98.60
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s
sda 0.00 0.00 0.00 0.10 0.00 0.80 0.00
sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdc 0.00 0.00 61.97 322.04 6821.79 2704.65 3410.89
Device: wkB/s avgrq-sz avgqu-sz await svctm %util
sda 0.40 8.00 0.00 15.00 15.00 0.15
sdb 0.00 0.00 0.00 0.00 0.00 0.00
sdc 1352.32 24.81 27.50 70.54 2.60 100.00
|
This example demonstrates the impact of contention for a physical drive's resources on its throughput and response time. When throughput or response time degrades on VSCSI devices, look at the utilization of the physical device to see if contention might be the cause.
Virtual LAN is a function of the POWER Hypervisor™ that enables secure communication between logical partitions without the need for a physical I/O adapter. When TCP/IP communication data flows between LPARs on VLAN, the TCP/IP tuning parameters affect the performance of the data flow. You can use a set of tuning parameters that work well in this environment without a physical I/O adapter. Listing 6 shows the tuning recommendation for VLAN performance.
Listing 6. Tuning recommendation for MTU1500
/sbin/sysctl -w net.ipv4.tcp_timestamps=1
/sbin/sysctl -w net.ipv4.tcp_sack=1
/sbin/sysctl -w net.ipv4.tcp_window_scaling=1
/sbin/sysctl -w net.core.netdev_max_backlog=3000
/sbin/sysctl -w net.ipv4.tcp_wmem='4096 87380 30000000'
/sbin/sysctl -w net.ipv4.tcp_rmem='4096 87380 30000000'
/sbin/sysctl -w net.ipv4.ip_local_port_range='8096 131072'
/sbin/sysctl -w net.core.rmem_max=10485760
/sbin/sysctl -w net.core.rmem_default=10485760
/sbin/sysctl -w net.core.wmem_max=10485760
/sbin/sysctl -w net.core.wmem_default=10485760
/sbin/sysctl -w net.core.optmem_max=10000000
echo 128 > /sys/class/net/eth0/weight
echo 128 > /sys/class/net/eth1/weight
|
The primary tool for network analysis with VLAN is the
netstat tool that can display a large amount of
information about the networking system. Two of the most useful outputs are
interface information and network statistics. You can display the network interface information
using netstat -i and the TCP/IP
protocol statistics using netstat -s.
One tool used to measure maximum TCP bandwidth is iperf (see Resources to go to the National Laboratory for Applied Network Research Web site).
For this example, iperf was used to check VLAN bandwidth on a POWER5
system configuration, including a four-processor computer using 0.5 physical CPU
for each server and client partition. Simultaneous multithreading (SMT) was turned
on, and the system employed 2 GB of memory. The reported throughput was only about
500 Mbits/sec. It should have been around 1000 Mbits/sec for the gigabit adapter
used. Listing 7 shows iperf throughput and
vmstat output. vmstat is a
Linux real-time performance monitoring tool. vmstat
reports CPU idle percentage in the next to last column labeled
id. CPU utilization is calculated as 100% - CPU idle.
Listing 7. iperf throughput and vmstat output
[root@power] /iperf_202/iperf-2.0.2/src > ./iperf -c en0host2 -w 1024KB -N
------------------------------------------------------------
Client connecting to en0host2, TCP port 5001
TCP window size: 256 KByte (WARNING: requested 1.00 MByte)
------------------------------------------------------------
[ 3] local 192.168.1.1 port 55990 connected with 192.168.1.2 port 5001
[ 3] 0.0-10.0 sec 632 MBytes 530 Mbits/sec
vmstat output:
[root@power] /root > vmstat 1
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
0 0 0 360088 75136 681724 0 0 1 3 13 37 1 6 93 0
0 0 0 360088 75136 681724 0 0 0 0 6 18 0 0 100 0
0 0 0 360088 75136 681724 0 0 0 0 9 12 0 0 100 0
0 0 0 360088 75136 681724 0 0 0 0 6 10 0 0 100 0
3 0 0 359684 75136 681724 0 0 0 0 358 96 0 2 98 0
1 0 0 359808 75136 681724 0 0 0 0 14774 1464 0 63 37 0
1 0 0 359684 75136 681724 0 0 0 8 13913 1452 0 64 36 0
1 0 0 359808 75136 681724 0 0 0 0 14676 1359 1 65 35 0
1 0 0 359544 75136 681724 0 0 0 8 14260 1598 12 67 20 0
1 0 0 359668 75136 681724 0 0 0 0 12198 1882 0 62 38 0
2 0 0 359544 75136 681724 0 0 0 0 13844 1435 1 63 37 0
1 0 0 359544 75136 681724 0 0 0 0 14808 1372 0 64 37 0
1 0 0 359668 75136 681724 0 0 0 0 13934 1454 0 62 37 0
1 0 0 359700 75136 681724 0 0 0 0 11327 1886 0 64 35 0
0 0 0 359576 75136 681724 0 0 0 0 14650 1343 0 60 40 0
|
The partition running the test was only allocated 0.5 physical CPU, which made the CPU utilization vmstat measured appear to be very high. This leads you to conclude that iperf is really CPU bound. The system was reconfigured for one physical CPU for each server and client partition. Running the test again on the newly configured system gave the improved result shown in Listing 8.
Listing 8. Improved iperf test results
[root@power] /iperf_202/iperf-2.0.2/src > ./iperf -c en0host2 -w 1024KB -N
------------------------------------------------------------
Client connecting to en0host2, TCP port 5001
TCP window size: 256 KByte (WARNING: requested 1.00 MByte)
------------------------------------------------------------
[ 3] local 192.168.1.1 port 39856 connected with 192.168.1.2 port 5001
[ 3] 0.0-10.0 sec 1.22 GBytes 1.05 Gbits/sec
|
With the additional CPU power, the benchmark can drive the Ethernet link at full speed. With this knowledge, the system administrator can choose the appropriate CPU resource allocation.
Understanding the virtual SCSI and virtual LAN features of IBM System p as supported by SLES 10 can help the administrator tune the system for better performance. This article showed that resource contention of a physical disk can cause throughput or response time degradation of VSCSI devices. Similarly, CPU constraint can limit performance of VLAN. Both situations can be relieved by adding physical resources to back the virtual device.
Learn
- See the article
POWER5 Virtualization: How to set up the SuSE Linux Virtual I/O Server
, developerWorks, May 2005, for information about setting up SUSE servers.
- Check out some useful Linux performance tool
information:
-
Refer to these books about Linux performance:
- Performance Tuning for Linux Servers , Sandra K. Johnson.
- Linux - Debugging and Performance Tuning , Steve Best.
- Optimizing Linux Performance - A Hands-On Guide to Linux Performance Tools , Phillip G. Ezolt.
- Want more? The developerWorks IBM Systems zone hosts hundreds of informative articles and introductory, intermediate, and advanced tutorials.
- Stay current with developerWorks technical events and webcasts.
Get products and technologies
- Get the latest kernel update for SUSE Linux from the SUSE Linux Portal.
- Go to Flexible File
System Benchmark (FFSB) for an open source performance evaluation tool.
- See the National Laboratory for Applied Network Research (NLANR)
Web site for the iperf tool to measure maximum TCP bandwidth.
- Build your next development project with IBM trial software for download directly from developerWorks.
Discuss
- Check out
developerWorks
blogs and get involved in the
developerWorks community.


