- After a reboot, exportvg worked without any errors
- importvg then worked and made the missing file systems available
- when rebooting, the problem reappeared
- the VM had been rebooted recently, prior to the AIX migration, and there were no problems.
AIX Down Under
AnthonyEnglish 270000RKFN Tags:  importvg volume_group vg varyonvg exportvg migration 3 Comments 14,867 Views
A customer did a migration from AIX 5.3 to 6.1 and then called me to report a strange set of symptoms. Some file systems didn't mount following a reboot. When the file systems in the volume group (let's call it datavg) went to mount, they returned the error that there was no such device. If the customer ran an exportvg and an importvg, all the datavg file systems became available. But then another reboot was done and the datavg file systems didn't mount.
Update: I have an idea of what might have happened. The customer (as I should have mentioned) did the migration to AIX 6.1 after restoring from an AIX 5.3 mksysb. I suspect the bosinst.data file had the option to Import user volume groups (such as datavg) set to No. The system had been migrated from 5.3 to 7.1, then when they ran into difficulties, they chose to restore the 5.3 mksysb and then upgrade just to 6.1. Perhaps I should have mentioned that background in the original post. Pretty poor omission if this was a detective story that you had paid hard cold cash for. But it isn't. And you didn't. So read on.
As I only had support over the phone, I had to do some detective work without being able to run any commands or look at screens for myself. (Technology is still pretty primitive down here in the antipodes).
Nested File System - not the culprite
I immediately thought it was a case of nested file systems that were attempting to mount before their parent file systems. This typically happens when the sub-file system (e.g. /tsm/log) appears in the file /etc/filesystems before the parent file system (/tsm). As /tsm hasn't been mounted, there is no mount point directory called /tsm/log, and so the /tsm/log file system fails to mount. This is usually as a result of someone manually editing /etc/filesystems.
As the customer pointed out to me, the missing device was the logical volume /dev/fslv00, not the file system's mount point directory. So I buried my nested file systems theory to be pulled out for next time a customer rings me with missing file systems.
Just, the facts, Sir
With my nested file systems theory soundly demolished, it was time to take a good, hard look at the facts so far:
The fact (fact #1) that the exportvg worked without any errors indicated that the volume group's file systems were not mounted before running the exportvg. If they had been mounted, we would have got some warnings that the file systems were in use. So a successful exporting actually hinted at a problem.
The fact that the importvg worked indicated that the volume group's disks were available to the OS. So there was no issue with SAN connectivity or some volume group corruption.
Since the volume group didn't come up after the reboot, perhaps the problem was with the volume group itself, rather than any one of the file systems or logical volumes belonging to it.
If the volume group had a problem with it, this seems to have occurred during the migration.
Patterns of Expertise
We looked for a pattern, and noticed that all the missing file systems belonged to the one volume group (datavg). The volume group could be imported / varied on from the command line, but didn't come up after a reboot. Then it twigged: the volume group had somehow been set not to varyon automatically following a reboot.
Sure enough, when we ran lsvg datavg we saw that datavg was set not to auto varyon. Nothing wrong with the volume group. It's just that somewhere along the way, someone, or perhaps a bug, had set it to varyon=no. Easily fixed, as the chvg command explains:
chvg -a y vg03
Which is exactly what we did for datavg.
Getting to the root cause
We rebooted and found the problem was certainly fixed. What caused the volume group not to varyon automatically as part of a reboot? A bit hard to know. Perhaps someone ran the importvg with the -n option, which Causes the volume not to be varied at the completion of the volume group import into the system.
Maybe there was a bug in AIX, where the migration didn't auto-varyon volume groups.
We may never get to the root cause. It's not a big concern, as we've set the varyon correctly now, and everything comes up fine after a reboot.
AnthonyEnglish 270000RKFN Tags:  concurrent vios upgrade 2.2 hmc updateios firmware virtual_io_server migration 1.4 disruptive 2 Comments 14,647 Views
This is the epic of the upgrade for the Virtual I/O server (VIOS) from version 1.4 to version 2.2. It's not a pleasant read, so if you're of a weak stomach, just skip the whole blog post, but note two things when it comes to VIOS and firmware upgrades:
The Epic Begins
Much as we'd like all our systems to be running the newest and latest of everything, most of us are not so lucky. There are still some very old environments out there, running on old hardware. I suppose it's a testimony to the systems that after 8 or 10 years, they can still run a business on them. There's nothing like complete neglect to keep a reliable system running on inertia. Just hope nobody needs to change anything.
I recently was asked to look at a system which was running dual Virtual I/O Servers (VIOS), both at version 22.214.171.124, which dates back at least five years. The aim was to get them running a recent VIOS version (126.96.36.199) so we could connect to new V7000 storage and a new Fibre Channel switch. The ability to use Shared Storage Pools on the VIOS was also an attraction.
The site had been running stably for some years, but the firmware, VIOS and AIX were all seriously out of date. A plan had to be worked out to upgrade.
For full disclosure, I have to say that the upgrade has not yet been done. This is just a walk through of the planning for the upgrade. So, take a deep breath, and here we go:
VIOS Hardware and Firmware Prereqs
The VIOS ties in so closely with the hypervisor, that it's really critical to ensure your system firmware is up to date before you consider any upgrades to your VIOS. If you're using a Hardware Management Console (HMC), then that should be at a compatible level. The upgrade order is:
I might point out that in most documentation "Firmware" refers only to the System firmware. Device firmware, such as for Fibre Channel adapters, doesn't rate a mention. So my plan was to include that, since it undoubtedly would not have been updated since the system was first installed.
Firmware alert: setback #1
As the system I was working on was a Power5 server which was managed by a Hardware Management Console (HMC), I had to know if the system firmware would be Concurrent (no downtime) or Disruptive.
I was going from firmware version SF240_358 to SF240_417. In other words, the major release was SF240, so it should ordinarily be a concurrent update when your system is managed by the HMC.
HMC managed systems
Non-HMC managed systems
However, this was not one of those "most cases" in which the installation could not be done concurrently:
In other words, complete downtime for all virtual clients and the VIO servers.
Firmware alert: setback #2
I also found that the adapter firmware had a prerequisite which I needed to be aware of:
Memory and Disk requirements
VIOS version 2.2 requires 4 GB of RAM and at least 30 GB of disk for rootvg. (If you're using software mirroring for rootvg, you need at least 60 GB).
updateios Won't Do Everything!
When you log into the VIOS restricted shell as the user padmin, you have a simplified command set. The commands are very easy to learn, and once you get used to them, you can almost guess your way through.
Now there's a great command for running updates called updateios. The syntax is very easy. For example, if you have your updates in a directory called /home/padmin/update, then the command you need is:
updateios -dev /home/padmin/update
Now you might think that moving from VIOS 1.4 to VIOS 2.2 only requires these steps:
Not So Fast!
It would be nice, but I wasn't so lucky.
First, since I was going from version 1.X (in my case 188.8.131.52) to version 2.X, I need to use the migration media. As you may know, the current VIOS 2.X is running AIX 6.1 under the covers. And by “under the covers” I mean using that sneaky oem_setup_env command which gives you full root access to the underlying AIX. When you're logged in as padmin, on the other hand, you're in a restricted shell, with a very limited command set available.
Migration to VIOS 2.X
First Stop: IBM Support Portal
Of course, the first port of call was the IBM Support Portal. You can just go to www.ibm.com and then select Support & Downloads or take the shortcut to IBM Fix Central. You'll need an IBM ID to log in. Registration is free and it only takes a minute to do.
When you select the product, in the Product Group, click on the drop down box and go to Software > Virtualization software. Here's a screenshot:
in the box Select from Virtualization software, use the drop down to go to PowerVM Virtual I/O Server:
From there you can choose the Installed Version. You can find that out by logging onto the Virtual I/O Server as the user padmin and then running the command ioslevel.
Notice that 184.108.40.206 doesn't even make the list? The oldest version starts at 220.127.116.11. I'd have to make do.
18.104.22.168 is old. My beloved Virtual Media Repository was introduced with version 1.5. That alone was reason for me to upgrade.
I was aware that moving from version 1.X to version 2.X required booting off a Migration DVD (or NIM resource). That made sense, because it's a similar procedure when you're migrating from AIX 5.3 to AIX 6.1 or 7.1. This jump is not just a list of patches (similar to the AIX update_all command). It was a migration. So, I thought the best approach would be to use the latest available Migration DVD. The Support Portal pointed me to
Migration DVD 22.214.171.124
However, using that DVD had a prerequisite: the last 126.96.36.199 Service Pack. Could I update from 188.8.131.52 to 184.108.40.206 (latest Service Pack)?
I went to the readme and found that (alas):
VIOS 220.127.116.11-FP 11.1 SP-02 should be applied to systems that are currently at IOS Level 18.104.22.168-FP-11.1. If your IOS level is lower than 22.214.171.124-FP11.1, you must install Fix Pack 11.1 before you can install the Service Pack.
So my upgrade path was now looking like this:
I haven't mentioned the time taken to download, back up, reboot etc. which are essential to each of the above steps.
A Small Breakthrough
I found on the IBM Support Portal another, older, VIOS Migration DVD, which would take me from 1.X to 2.1.
Now my upgrade path looked like this:
That's much simpler.
I went to the IBM Support Portal, selected Downloads and then downloaded the latest VIOS packs. At last, my plan was complete, and I should be able to run updateios, which is all I ever wanted to run in the first place.
An Alternate Approach
If you think this was jumping too many hoops, consider another option for your own environment: doing a fresh install of the VIOS. Sure, you'd need to map and recreate Virtual Target Devices, and Shared Ethernet Adapters, but if you're organised and have all of that well-documented, it may be a better option, especially if you only only have a small number of virtual disks that get mapped through to clients.
AnthonyEnglish 270000RKFN Tags:  physical_volume varyoffvg recreatevg varyonvg migration volume_group importvg pv file_system disk exportvg aix lun flash_copy 4 Comments 11,304 Views
It's pretty easy to move a volume group from one AIX system to another. You unmount all the file systems from the source volume group, varyoff the VG, export the volume group (exportvg), and then remove the disks from the source system (rmdev -dl hdiskN). Then you assign the LUNs to the target host, import the volume group, mount the file systems, and check permissions.
But what if you want to copy a volume group? You might want to replicate a volume group, by doing a flash copy across the SAN. Then on the remote site, you'd present the SAN LUNs to the target host, run cfgmgr to get the host to see the new disks. The disks on the source host may be named differently on the target host, because the target will just assign the next available hdisk number when you run the cfgmgr. The hdisk numbers may be different between the source and target hosts, but the Physical Volume IDs are the same. After all, the target LUN is a replica of the source LUN.
The problem: duplicate PVIDs and LVs
But that brings up a problem: duplicate PVIDs.
Enter the recreatevg command. Just like the move you can do with importvg and varyonvg,
That overcomes the issue of duplicate PVIDs and Logical Volume IDs.
Now when you run the importvg command yourself, you only specify one physical volume. For example, if the volume group consists of hdisk1, hdisk2 and hdisk3, then the command
importvg -y datavg hdisk2
will import the entire volume group, since the Volume Group Descriptor Area (VGDA) is on all three disks, and all the disks in the volume group know the PVIDs of all the other disks in that volume group.
When you run recreatevg,
The recreatevg command removes all logical volumes that fully or partially exist on the physical volumes that are not specified on the command line. Mirrored logical volumes can be an exception (see the -f flag).
So if you're wondering why the logical volumes on the disks you forgot to mention didn't get included, the LVM will reply: "you never asked." I expect that's because recreatevg (unlike importvg), isn't relying on PVIDs, since it's creating new ones.
Update: Hoarders and chuckers
There are two kinds of people in the world. Some like to keep their old junk, just in case they'll need it some day (they won't!). Others like to toss it out, just in case they don't need it (they will!). The first are the hoarders, the second are those who are chuckers (with apologies to our cultured readership for using such an expression). Well, importvg is a hoarder: you nominate one disk and it assumes you want the others in the volume group. recreatevg, on the other hand, is a chucker.
Perhaps people with more experience using recreatevg will have comments about how this all works in the real world (the recreatevg command, not how to keep the peace between hoarders and chuckers).
AnthonyEnglish 270000RKFN Tags:  sp tl release commit upgrade aix technology_level migration fileset reject service_pack oslevel apply patch update 59,954 Views
What do all those numbers mean on oslevel -s?
If you're looking for some guidance on understanding exactly what level of AIX you're running, you may like to read my article in the IBM Systems Magazine on Understanding AIX versions. It answers questions such as:
AnthonyEnglish 270000RKFN Tags:  integrated_virtualization... sea management ethernet lpar virtual ivm id outage shared rob_mcnelly hmc v126.96.36.199 dlpar power server aix adapter ibm hardware_management_conso... migration 12,538 Views
In September 2009 Rob McNelly wrote on his AIXChange blog about Migrating from the IVM to the HMC. I have documented my own experience of this procedure. You can download it from here, at a very affordable price of USD 0.00 (no refunds).
The IVM or Integrated Virtualization Manager, is a browser interface to the VIO server on smaller systems, and it has HMC-like functionality, such as Dynamic LPAR, the ability to configure LPARs, stop and start them and so on.
The HMC (Hardware Management Console, as you know) is able to manage several physical servers and is mandatory for larger systems. It can also be used for smaller systems, and is a worthwhile investment, in my view, once you get beyond a single small server.
Two servers, two IVMs
I had a client who had bought a production Power6 550 and a P6-520 for Dev and Test. After some months of discussion, their Business Partner convinced them of the benefit of investing in an HMC to manage these two systems with their growing number of LPARs. The challenge was migrating each of the servers from being IVM-managed to the HMC. I have put together a document of my own experience of the migration. It doesn't attempt to be a step-by-step guide. More of a diary for my own benefit but you may find it useful.
Forward planning brings us unstuck
We thought we were being safe by getting some work done ahead of the outage time. We racked and cabled the HMC and put it on the network, in preparation for the scheduled outage two weeks hence. Problem was, no one told the HMC the planned go live date. To our surprise, it immediately discovered the two servers. At the same time, the HMC was reporting the two servers were in "Recovery" state, but it wouldn't take further control of the systems or their LPARs until the outage which was scheduled for after a huge month end. The IVM had been effectively disabled, so any IVM-specific commands were out of bounds. No profile backups, no DLPAR, no shutdown and activation of LPARs was permitted, either from the IVM or from the HMC. Nothing would undo it - not even powering off and disconnecting the HMC from the network.
We had a VIO server, but no IVM and no HMC that we could do anything useful with. It was the technological equivalent of a hung parliament.
All's well what ends well
In the end, it all worked, and the customer has been running happily on the HMC for many months now. Still, it was a challenge. You can find my comments about the migration from IVM to HMC Migration - A Customer's Experience
Looking back, it was quite funny, I suppose. As long as you weren't me.
AnthonyEnglish 270000RKFN Tags:  installation virtual_media_library vmlibrary upgrade 7.1 aix6 aix7 nim vio aix5 fun migration 6.1 aix 15,294 Views
Fun out of the Sun
In the last few days I've had lots of fun with migrations to AIX 6 and 7. And when I say "fun", I don't mean it in its usual technical sense within IT (insurmountable problems, unrelenting stress, obscure workarounds and chronic sleep deprivation).
I mean "fun" according to its (almost obsolete) usage:
Over the last few days, I have
All of the upgrades went smoothly, with no challenges or what IT people mean when they say "fun".
Making a fresh start
My last feather in the AIX 7 cap was to do a "New and complete overwrite" installation of AIX 7.1. You might do this instead of migrating your system from an earlier release. If your LPARs are full of inconsistencies and undocumented workarounds, maybe it's better to build a system from scratch.
If you want to (or need to) make a fresh start, so that you can clean out the soul of your system from the sins of the past, it's good to know that AIX 7.1 is fully binary compatible with AIX 6 and AIX 5. Building a shiny new LPAR is especially useful if you decide to create an AIX Standard Operating Environment (SOE) LPAR.
Half an hour to AIX 7.1
The fresh install of AIX 7.1 took around 25 minutes, using the VIO Server Virtual Media Library, which is how I did the other AIX migrations. The new and complete overwrite installed 591 filesets as the main course, and then another 4 for dessert.
Here you see the creation of the boot image at the end:
The reboot time took around 3 minutes. I've yet to install the mandatory AIX 7.1 service pack but I expect it will be all over red rover in about 5 minutes. It can be downloaded from the IBM Fix Central web site.
As with the other installations, the VM Library was running off internal SAS disk allocated to the VIO Server. The target LPAR boots from SAN, using vscsi disk which is passing through two VIO servers using MPIO.
AIX 7.1 is fun
I'm enjoying these AIX installations and migrations but they were so straightforward that it's getting a little quiet around here.
I wonder when AIX 8 will be ready.
AnthonyEnglish 270000RKFN Tags:  reboot 6.1 5.3 aix 7.1 vmlibrary bootlist migration virtual_media virtual_optical_device 2 Comments 13,651 Views
Quick migration to AIX 6.1
Yesterday I used the Virtual Media Library to do a migration of an LPAR from AIX 5.3 TL 11 to AIX 6.1. The migration took under 25 minutes. This included:
I then installed the latest service pack, once again via the VM Library. This took just over 2 minutes. I had downloaded the service pack from the IBM Support Portal.
Fast and painless migrations
All up, the migration was about 25 minutes, including rebooting time. Of course, this doesn't include the preparation time downloading the software or time to take a backup.
This is just a few minutes faster than the migration to AIX 7.1 which I did on a different LPAR last week, but both of them were really very fast and painless. That's how it should be.
AnthonyEnglish 270000RKFN Tags:  vios installation virtual_media upgrade file_backed power aix download vmlibrary migration vio_server iso 5 Comments 49,277 Views
I just did my first migration of an LPAR to AIX 7.1. I chose to migrate my AIX 6.1 NIM server.
I first downloaded AIX 7.1 from the Entitled Software Support web site.
I copied the two ISO images to the Virtual Media Library on the VIO server, and I I gave the two ISO images shorter names when I copied them into /var/vio/VMLibrary. You can also rename them using the VIO server mkvopt or chvopt commands.
AnthonyEnglish 270000RKFN Tags:  powerha live lpm migrate cookbook high_availability aix hacmp workload_partition uptime migration wpar outage live_partition_mobility 2 Comments 11,775 Views
High Level high[-er] availabilty options
IBM HACMP (High Availability Cluster Multiprocessing) has been available since 1991. It was renamed to IBM PowerHA and, more recently PowerHA System Mirror for AIX. While studying up on this recently, I came across some excellent comparisons between PowerHA, Live Partition Mobility and migration of Worload Partitions (WPARs). This is a high-level comparison of how the three might help with managing outages and make for higher availability, even if they aren't all PowerHA.
LPM and HA
First, Live Partition Mobility (LPM). This is the facility which allows you to migrate a running partition with its applications from one physical server to another without disrupting services. The Redbook for IBM PowerVM Live Partition Mobility explains:
.Live Partition Mobility increases global availability, but it is not a high availability solution. It requires that both source and destination systems be operational and that the partition is not in a failed state. In addition, it does not monitor operating system and application state and it is, by default, a user-initiated action.So LPM is good if the outage is planned.
"Live" for a reason
The difference between HA and LPM came home to me recently when I was watching a presentation by an IBMer on PowerHA. Shawn Bodily has worked on HACMP/PowerHA for over ten years. He was on the team that wrote the excellent PowerHA for AIX Cookbookand he presented webinars (see below) over three days in July 2009 on PowerHA (as it was then called). In the second of those days, Shawn answers a question about LPM as an alternative to PowerHA. Here is my transcript of the relevant section:
"When we talk about clustering we talk about reducing planned outages and unplanned outages. Most people associate it [HACMP] with unplanned outages but by far, most downtime today is still planned maintenance and Live Partition Mobility is great for planned maintenance if it's hardware related. If I'm upgrading a server and people can do firmware updates dynamically [i.e. concurrently] and some people choose not to. If I'm doing something like that, Live Partition Mobility is great. Even most of the maintenance is software maintenance. If you have to upgrade your application, update AIX, Live Partition Mobility does nothing for you. Here's why: the fact you're running the exact same rootvg and application on another frame."WPARs for AIX update