A little while ago, I needed to perform an operating system patch on an IBM® AIX® server to a more recent technology level and service pack. This server had not had any software maintenance in some time, so it was due for a tune-up. I got everything lined up for a maintenance window and started the upgrade.
As things were running, I noticed that the server became particularly slow. Although no applications were active on the system and no users were logged in, the server's performance slowed considerably. It didn't make sense, as this was a stand-alone server, not a virtualized logical partition (LPAR) with any other systems contending for CPU resources or shared infrastructure.
While performing some troubleshooting, I pulled up the server's real-time data using the
topas command and noticed that the Wait
column had a tremendously high value. But only one disk was causing all of the
overhead—hdisk0—where the root volume group resided. I looked at the disk
lspv –p command, and suddenly everything made
sense (see Listing 1 below).
Listing 1. Results of the lspv -p command
# lspv -p hdisk0 hdisk0: PP RANGE STATE REGION LV NAME TYPE MOUNT POINT 1-1 used outer edge hd5 boot N/A 2-110 free outer edge 111-114 used outer middle hd6 paging N/A 115-116 used outer middle livedump jfs2 /var/adm/ras/livedump 117-124 used outer middle lg_dumplv sysdump N/A 125-130 used outer middle paging00 paging N/A 131-136 used outer middle hd6 paging N/A 137-217 free outer middle 218-219 used outer middle hd2 jfs2 /usr 220-220 used center hd8 jfs2log N/A 221-222 used center hd4 jfs2 / 223-237 used center hd2 jfs2 /usr 238-240 used center hd9var jfs2 /var 241-241 used center hd3 jfs2 /tmp 242-242 used center hd1 jfs2 /home 243-245 used center hd10opt jfs2 /opt 246-246 used center hd11admin jfs2 /admin 247-259 used center hd10opt jfs2 /opt 260-265 used center hd4 jfs2 / 266-266 used center hd2 jfs2 /usr 267-279 used center hd9var jfs2 /var 280-284 used center hd2 jfs2 /usr 285-285 used center hd3 jfs2 /tmp 286-286 used center hd2 jfs2 /usr 287-317 used center hd1 jfs2 /home 318-328 used center paging00 paging N/A 329-425 free inner middle 426-430 used inner middle hd6 paging N/A 431-437 free inner middle 438-535 free inner edge 536-546 used inner edge paging01 paging N/A
When I saw how this disk was laid out, it explained why there was an I/O bottleneck. The disk had a number of mistakes and some poor planning that caused even a simple upgrade to take considerably longer than it should have. It proved that even in today's world of storage area networks (SAN), Internet SCSI (iSCSI), and other top-tier storage solutions, even basic internal disks need to be optimized and set up properly so that modern, more robust servers can run well.
This article provides some of the basic tools and techniques that you can use to optimize internal storage on AIX servers. Using the situation I encountered as a case study, the article walks through how disks are viewed within AIX and how they interact with the Logical Volume Manager (LVM). From there, it discusses various strategies and commands for reorganizing internal disks and making them more solid, including some lower-level commands and simple shortcuts. Then, the article offers some higher-level planning ideas to prevent these situations from happening again.
The AIX operating system divides hard disks—also known as physical volumes (PV)—into fixed areas of space called physical partitions (PP). These PPs are all uniform in size and span from the outer edge of the disk, moving inward toward the center of the spindle. As these PVs are gathered, they become known as volume groups (VG). Within these VGs, the system creates structures called logical volumes (LV) that gather PPs into usable areas on which file systems can be created.
The PP size is fixed with the creation of the VG, and further PVs added to the VG are expected to conform the same PP size. Each PP can be assigned to only one LV at a time, and any LV must have a minimum of one PP to exist. If file systems are grown, the minimum size by which they will expand is one PP.
AIX divides PVs into five areas relative to their location on the hard disk platters themselves: outer edge, outer middle, center, inner middle, and inner edge. This division is illustrated below in Figure 1.
Figure 1. Disk layout
Several mathematic and physics-based properties affect disk I/O, latency, and access across these five regions. Because of conservation of angular momentum, the outer edge of the disk has a faster rotational velocity than the slower inner edge—much like how kids riding a carousel at the playground will whip around much more quickly on the outside than in the middle. However, because not all the data on the disk will be written to the outer edge as a result of the limitations of the physical placement of the data, the fastest seek times for the hard disk heads will be in the center area of the disk, where the head is most likely to pass on average.
In the case of the server I was trying to patch, the data was located across the disk in an almost chaotic or random manner. But before that data could be sorted out, more considerations had to take place for how AIX manages LVs directly.
AIX has a native tool called the Logical Volume Manager which handles the relationships among PVs, VGs, LVs, and file systems. The LVM creates the logical structures on physical media for managing data within the operating system. Relevant to disk optimization, the LVM also provides a means of customizing availability, performance, and redundancy.
The two most widely used features that LVM has to boost optimization are through mirroring and striping. LVM allows you to mirror LVs with up to three copies of the data. Here, one LP can point to one or more PPs. This way, should a hardware failure occur, the data is preserved on other physical devices. In smaller systems that use internal storage especially, it is crucial that all data be mirrored to prevent unplanned outages.
Striping places data across multiple hard disks so that multiple read and write operations can occur simultaneously across a wide number of disks. You enable striping by changing the inter-physical volume allocation policy property for each LV. By setting the policy to the minimum (the default), there is greater reliability, because no one disk failure could affect the LV. By setting the policy to the maximum, LVM stripes the data across the maximum number of disks within the VG possible, maximizing the number of I/O operations that can occur at once.
When looking at optimizing internal disks, the balance between performance and redundancy has to be kept in check. With smaller systems, there is typically not enough room to perform large-scale striping, let alone striping with redundancy through mirroring. With larger systems that have large quantities of internal hard disks, striping and mirroring can dramatically boost performance and minimize I/O wait times.
By combining these concepts of how AIX physically structures hard disks with how LVM can impose logical structures onto disks, I have put together several principles to help optimize internal disks:
- Make LVs contiguous as much as possible. When file systems are strewn across a disk, the hard disk head takes longer to find any wanted data. If the LV exists in one continuous area, the seek time is minimized and the files can be found more quickly.
- Place LVs with high I/O or sequential read or write operations on the outer edge. Because of the speed of the outer edge of the disk, LVs that need faster reads or writes with data in long sequences (like large static files or database contents) will benefit from the higher rotational velocity at the outer edge.
- Place LVs with high activity toward the center. If you have a file system that has a great deal of reads and writes and needs quick responsiveness, because of averages, the hard disk head will most likely be near the center at any point in time. By placing those file systems in this area, there will be a higher likelihood that the head will be in that general vicinity. This configuration reduces seek time and maintains good I/O.
- Place LVs with low usage near the inner edge. If you have any file systems that are rarely used or accessed, get them out of the way by putting them in the part of the disk with the lowest I/O speeds—near the spindle at the inner edge. For example, logical volumes that are seldom used fit well here.
- Use only one paging space LV per disk. The purpose of paging space is to serve as a temporary place to swap pages in and out of memory to an area of physical storage, thus allowing the CPU to perform more operations and catch up on things. Defining multiple paging space LVs on the same disk defeats the purpose of trying to remedy performance shortfalls by causing more I/O, as the disk head has to go to several areas of the platters instead of just one.
- Where possible, don't mirror paging space. Again, because the purpose of paging space is to use physical storage to offset resource shortfalls, it makes no sense to write the same set of data twice in a mirrored configuration. Instead, if two disks have two separate paging space LVs, each can address memory swapping and double the effectiveness.
- Keep paging space sizes uniform. One last point on paging
space is to keep paging space LVs as uniform as possible. If you have two
73GB internal disks and one has a 1GB paging space LV while the other one
has a 4GB paging space LV, one of these two will likely experience more wear.
And, depending on how the paging space was added to the server or shrunk in
size, one of the LVs may become full and adversely affect the system. Plus,
keeping sizes the same makes the
lspscommand output look a little cleaner and more accurate.
- Keep some free space around. Although there may be some
administrators out there who try to squeeze every last drop of capacity from
their servers, it is always best to keep some free space on the disks. This is
not just relative to having one or two large gaps in the inner or outer middle
regions, but having space between the LVs in case anything needs to be grown
and a little free space in the file systems themselves. If everything is pressed
together, all it takes is one small
chfs –a size=+1MBcommand to wreck LV contiguity.
- Mirror things properly. One common mistake administrators make is to create a new file system on a mirrored VG, then forget to create a copy onto the other disk. Just because the VG is mirrored does not mean that any subsequently added LVs that support the file systems will also be mirrored. Always make sure that new LVs get a copy on a separate PV.
- Distribute I/O load as much as possible for performance and redundancy. If you have a larger system with multiple drawers, leverage them by striping the data across sets of disk packs. And, if you choose to mirror your PVs, do it across different drawers so that if a backplane fails, redundancy is not limited to one drawer.
The next step in this process is to make these principles for internal storage optimization a reality. There are several commands I employ to straighten out wayward disks and make them conform to how they should be designed.
lslv commands are the key tools for visualizing
how everything is laid out. The
lspv command shows
how the data is dispersed on the PVs (as shown in my earlier example). The
lsvg command with the
flag shows at a glance whether the data is mirrored and what contents are within
the VG. And the
lslv command should be used with the
–m flag to analyze the layout for any particular LV
within the five areas of the PVs, as shown here:
hd4:/ PV COPIES IN BAND DISTRIBUTION hdisk0 128:000:000 100% 000:108:020:000:000
chlv command is typically the first command I run
that changes things to optimize disks. I use it to set the intra-PV allocation policy
for where LVs should be placed on the PVs, from the outer to the inner edge. This
command helps in consolidating the data and making it contiguous.
If I find that some of the LVs are not on the correct PVs, before I go into more of the
reorganization, I use
migratepv to shift things
around between PVs by using the
–l flag to move data
on an LV-by-LV basis.
reorgvg command is the true workhorse for optimizing
internal disks. This command takes the intra-PV allocation policy for each LV and
reorganizes the PV around these standards. It does its best to move the LVs into
the specific areas of the disks and align the space properly. But it can take a
considerable amount of time to run in some circumstances, and it may not be able
to get everything perfect, which is why manual intervention may be needed.
After LVs have been shifted around, there may be one or two pesky stuck LPs or
PPs somewhere on the disk, so that a piece of /tmp winds up surrounded by
most of /usr. The
migratelp command can move
data on an LP-by-LP basis manually to get things out of the way.
You use the
commands to mirror an entire VG at once (
or make copies of specific LVs, such as when file systems are added
mklvcopy). But, there are two tricks you can use
mirrorvg to optimize internal storage. If the
intra-PV allocation policies have been set with
by mirroring the volume group with a
–c flag, the
data will be mirrored to a fresh PV and conform to those policies along the way,
laying it out cleanly. Or, if you already have a nicely optimized disk, you can use
–m flag and have the second PV laid out exactly
like the first to have completely parallel reads and writes.
After putting these principles in place and using the tools above, I was able to remedy my poor disk mainly through consolidating paging space, sizing file systems correctly, and putting LVs in proper places on the PV. As you can see in Listing 2 below, the disk space was used much more effectively, and I/O definitely improved.
Listing 2. Running lspv after disk optimization
# lspv -p hdisk0 hdisk0: PP RANGE STATE REGION LV NAME TYPE MOUNT POINT 1-1 used outer edge hd5 boot N/A 2-2 used outer edge hd8 jfs2log N/A 3-110 free outer edge 111-130 used outer middle hd6 paging N/A 131-147 used outer middle hd9var jfs2 /var 148-219 free outer middle 220-245 used center hd2 jfs2 /usr 246-252 used center hd1 jfs2 /home 253-255 used center hd3 jfs2 /tmp 256-270 used center hd10opt jfs2 /opt 271-328 free center 329-337 used inner middle hd4 jfs2 / 338-437 free inner middle 438-440 used inner edge livedump jfs2 /var/adm/ras/livedump 441-448 used inner edge lg_dumplv sysdump N/A 449-449 used inner edge hd11admin jfs2 /admin 450-546 free inner edge
With this layout, the hard disk head does not have to move too far away from the center area, I have room to grow some of the file systems, and there is less I/O wait time. By mirroring this configuration identically to a second disk, I also provided redundancy to mitigate possible failures.
In the old days of AIX 4.3.3 and IBM RS/6000® servers, disk optimization was necessary to make servers perform well because of the limited resources available. But even though disks today often exceed 100GB in size and 15,000 RPMs in speed and newer IBM POWER7® servers run faster and do more than any AIX servers before, this relatively simple process for optimizing internal storage can still make all the difference.
For information on preventing and recovering from disk failures, check out
good disks go bad (Christian Pruett, developerWorks, September 2011).
For more information on using appropriate disk placement prior to creating your LVs,
AIX 7 performance: Part 2, Monitoring logical volumes and analyzing the results
(Martin Brown and Ken Milberg, developerWorks, October 2010).
For more on LVM troubleshooting and commands, see the IBM Redbook
Logical Volume Manager from A to Z: Troubleshooting and Commands.
AIX and UNIX developerWorks
zone: The AIX and UNIX zone provides a wealth of information relating to
all aspects of AIX systems administration and expanding your UNIX skills.
New to AIX and UNIX?
Visit the New to AIX and UNIX page to learn more.
bookstore: Browse the technology bookstore for books on this and other
Follow developerWorks on Twitter.
developerWorks blogs: Check out
our blogs and get involved in the developerWorks
Participate in the AIX and UNIX forums:
- AIX forum
- AIX for Developers Forum
- Cluster Systems Management
- IBM Support Assistant
- Performance Tools
- More AIX and UNIX forums
Christian Pruett is a senior UNIX systems administrator with more than 14 years of experience with AIX, Sun Solaris, Linux, and HP/UX in a wide variety of industries, including computing, agriculture, and telecommunications. He is the co-author of two IBM Redbooks on AIX, has served as a UNIX book review for O’Reilly Publishing, and has worked on several of the IBM AIX certification exams. He resides in Colorado with his wife and two children. You can reach Christian at firstname.lastname@example.org.