Avoiding the gotchas of AIX LPAR migrations
Many times in the computing world, systems that worked well a year ago outgrow their hardware and suddenly require more resources. Fortunately, with the latest AIX LPAR technology, it has become much easier to migrate a server from one piece of hardware to another with little downtime. The virtualization, portability, and manageability that IBM pSeries® and System p® hardware offer provide flexibility in systems administration and support. But there can be a few speed bumps along the way in the process of migrating to an AIX LPAR. This article focuses on avoiding the common "gotchas" that can spring up and slow down AIX LPAR migrations.
Gotcha 1: Resource shortfalls
The first and most important thing in moving to a new LPAR is to ensure that sufficient resources are available. The most common reason for moving is that the resources on the original hardware are no longer able to support the purpose of the server. Nothing is worse than putting in long hours only to find that the new hardware lacks power and functionality.
Although the purpose of this article is not to serve as a sizing document for architecting LPARs, it is critical that the new LPAR have all the necessary features. How many and what kind of processors will you need? Will the processors be dedicated, or will they come from a shared pool? Will there be enough memory? What about the I/O adapters? Is it worth setting up a virtual I/O (VIO) server to manage the resources? Are there any floor, rack space, power, or cooling restraints?
To answer these and other questions, IBM provides the Systems Workload Estimator (see Related topics). This tool can provide approximate sizing information for the new equipment you'll need, as well as a general idea of the type of equipment you'll need. You can then combine the information from the tool with information from third-party vendors to properly determine the resources needed to meet LPAR requirements.
Gotcha 2: Establishing the root volume group
Moving on to the technical side of LPAR migrations, the first thing you must manage is the root volume group (rootvg). Just as a house needs a firm foundation, the rootvg must be solid for the migration to be successful. There are three main ways to establish the rootvg, each with its own pros and cons.
Strategy 1: Fresh operating system installation
Here, the operating system is installed from CD, DVD, or Network Installation Manager (NIM) on the disks to create the rootvg.
- Pros: This installation is the cleanest and most pristine way of loading the operating system. Over the years, I have seen servers that have migrated through every version and release of AIX from 4.3.2 to 5.3, and even though the AIX operating system migration process is one of the most robust upgrade paths around, fragments of software and third-party applications can sometimes sludge their way through the operating system iterations. These elements can make the server look unsightly at best or complicated and difficult to upgrade and manage at worst. But by using base media, the operating system is guaranteed to be just it was when it rolled off the factory floor.
- Cons: Unfortunately, with a completely clean installation, making the new LPAR look like the old server can be time-consuming. You'll have to re-create user IDs, groups, file systems, environment variables, and all the features that define the server. If the server is simple, this may be the way to go. But if the server has a complicated environment with hundreds of users, it may be more worthwhile to choose another route.
Strategy 2: Physical disk move
Here, you remove the physical disks from the original server, insert them into the new hardware, and assign them to the LPAR.
- Pros: By taking the root disks out of the old server and putting them in the new, you guarantee that the new server has the same identity as the old server. Just about everything in the rootvg will be preserved, and you can count on everything being available upon first boot.
- Cons: There are four reasons why this strategy is not the wisest option. First, there is a chance that you could drop or damage the disks while physically transporting them. Second, the original server may not have all the device drivers to make the new hardware functional, which will then require hunting down and installing additional software. Third, you will most likely have to delete or reconfigure devices to make the new LPAR functional (I cover this in more detail later). Finally, there is a chance that the form factor of the disks will be incompatible with the new hardware, completely ruling out this option altogether.
Strategy 3: System backup and restore
Here, you back up the original server to an mksysb image, then lay it down onto the new hardware.
- Pros: I prefer this strategy, because it preserves the original hardware, allows the new LPAR to get device drivers automatically (when using complementary base media or a NIM server), and brings over all of the user IDs, groups, rootvg file systems, and environment variables. It is the least intrusive of all the options for both the original server and the new LPAR.
- Cons: The main deterrent to this solution is if the rootvg on the original server is constantly being modified. The mksysb image will only be "fresh" for so long before the changes to the original server cause a huge delta in the content. But if best practices have been followed in preventing regularly changing data from being inside the root volume group, this option can provide ample time for testing and deployment. And you will need to acquire temporary IP addresses or network adapters to make the build happen, then correct them later.
Also, it is important to determine the type of disks to be used for the rootvg disks. Will the server use internal SCSI disks or Storage Area Network (SAN) boot technology? If SCSI disks will be employed, patching SAN drivers is an easier task, but there can be a higher likelihood of hardware failures. If SAN boot technology is used, reboots and I/O performance will be very fast, but the disks cannot be ported from one LPAR to another easily because of how the Fibre Channel adapter worldwide name (WWN) is integrated into the disk attributes.
Gotcha 3: Managing external volume groups
When you have chosen the operating system strategy, it is time to plan how to bring over the external/non-root volume groups. As with the rootvg, there are several ways to migrate data onto the new LPAR, including performing a backup and restore operation, using SAN technology such as the Flash Copy feature on IBM SAN Volume Controllers (SVC) or just using the original cables, adapters, and disks in the new hardware. But whatever technique you leverage, there are a few things to consider.
First, save a copy or back up the /etc/filesystems file from the original server. Although
importvg will detect the contents of the
external volume groups, the contents may not be imported in an organized fashion,
causing file systems to be mounted in an incorrect order. I have seen this typically when
a customer has several external volume groups and file systems are hierarchically linked.
For example, consider an Oracle database server with three volume groups and interrelated file systems:
- datavg1: /opt/app
- datavg3: /opt/app/oracle
- datavg2: /opt/app/oracle/product
If these volume groups were imported numerically, the system would try mounting /opt/app/oracle/product, then over-mount it with /opt/app/oracle. But by having a copy of the original /etc/filesystems file, you provide a reference for organizing the same file on the new LPAR.
Second, save a copy of disk information, especially if the disks will be moved directly from the original server to the new LPAR. Make sure to document Physical Volume Identification (PVID) numbers, volume group names, health check intervals, and any other tunables. This way, disks can be identified, configured, and imported with no guesswork.
Third, consider other disk maintenance that you can perform during the migration. Take advantage of the server being unavailable to change the disk architecture. You can change external volume groups to big or scalable volume groups. You can replace numerous small disks with a single large logical unit number (LUN). You can reclaim unwanted file systems or disk space to save resources.
Gotcha 4: Setting up and configuring devices
The last challenges to completing the LPAR migration are to get all the other devices set up properly. Depending on the strategies you used earlier, management of the devices can be a time-consuming task or an easy endeavor.
Before moving over to the new server, take the time to get all the configuration
parameters from the existing devices. The
command will show all the devices that are on the old server; you can then run
commands against that output to get all the customized settings and attributes.
Some of these parameters will not necessarily apply equally to the new LPAR
because of the different hardware, but some settings should be carried over,
like IP aliases, memory tunables, and Fibre Channel speeds.
If the root disks are imaged from original media or a NIM installation, you will need to set up and configure all the devices when the server boots up (except for the network adapter that NIM uses). This is the most time-intensive course of action for configuring devices and will require the most attention. And if you used NIM, you may need to change the IP address and/or hostname when the cutover time occurs.
If the root disks will be physically moved to the new LPAR or if you select the option
in NIM to recreate devices, you will need to sort out a number of old and new
devices. For example, if the original server had one Ethernet adapter (ent0) and
the new server has one Ethernet adapter (ent1), the new server will have one
defined adapter (ent0), one available adapter (ent1), and no active interface with
a working IP address. So, you will have to delete both interfaces using the
rmdev –dl <interface> -R command, re-detect
the correct device using
cfgmgr, and then set an IP
address. The same process applies to any other adapters and disks that were
I have also found that the one thing that tends to come back and haunt me is
device definitions that do not correlate directly to a physical piece of
hardware—in particular, asynchronous I/O devices (AIO) on database
servers. For these devices, first run a
to make the device available, then run
chdev to set
the device to be available upon reboot. Otherwise, there will be some very agitated
DBAs once the new LPAR is active.
In my years of experience, I have performed dozens of successful LPAR migrations. And the one piece of advice I cannot reiterate enough is this: Plan, plan, and plan some more! Although it may seem like gathering so much information about the original server is trivial and going over each and every part of the hardware a waste of time, I cannot count the number of times where knowing something abstract like a PVID, Maximum Transmission Unit (MTU) size, or timeout value came in handy to save a customer account. There is no worse feeling than flipping the switch and watching a new LPAR grind to a halt, but by avoiding these common gotchas, you will have the greatest chance for success in your LPAR migration.
- UNIX on Power systems: Learn more about AIX Power systems software.
- HMC attached system setup: Learn how to set up your system using the Hardware Management Console.
- AIX wiki: Get the technical information you need from this collaborative site.
- Systems Workload Estimator: Download the Systems Workload Estimator for use on your server hardware.