- After a reboot, exportvg worked without any errors
- importvg then worked and made the missing file systems available
- when rebooting, the problem reappeared
- the VM had been rebooted recently, prior to the AIX migration, and there were no problems.
AIX Down Under
AnthonyEnglish 270000RKFN Tags:  importvg volume_group vg varyonvg exportvg migration 3 Comments 14,885 Views
A customer did a migration from AIX 5.3 to 6.1 and then called me to report a strange set of symptoms. Some file systems didn't mount following a reboot. When the file systems in the volume group (let's call it datavg) went to mount, they returned the error that there was no such device. If the customer ran an exportvg and an importvg, all the datavg file systems became available. But then another reboot was done and the datavg file systems didn't mount.
Update: I have an idea of what might have happened. The customer (as I should have mentioned) did the migration to AIX 6.1 after restoring from an AIX 5.3 mksysb. I suspect the bosinst.data file had the option to Import user volume groups (such as datavg) set to No. The system had been migrated from 5.3 to 7.1, then when they ran into difficulties, they chose to restore the 5.3 mksysb and then upgrade just to 6.1. Perhaps I should have mentioned that background in the original post. Pretty poor omission if this was a detective story that you had paid hard cold cash for. But it isn't. And you didn't. So read on.
As I only had support over the phone, I had to do some detective work without being able to run any commands or look at screens for myself. (Technology is still pretty primitive down here in the antipodes).
Nested File System - not the culprite
I immediately thought it was a case of nested file systems that were attempting to mount before their parent file systems. This typically happens when the sub-file system (e.g. /tsm/log) appears in the file /etc/filesystems before the parent file system (/tsm). As /tsm hasn't been mounted, there is no mount point directory called /tsm/log, and so the /tsm/log file system fails to mount. This is usually as a result of someone manually editing /etc/filesystems.
As the customer pointed out to me, the missing device was the logical volume /dev/fslv00, not the file system's mount point directory. So I buried my nested file systems theory to be pulled out for next time a customer rings me with missing file systems.
Just, the facts, Sir
With my nested file systems theory soundly demolished, it was time to take a good, hard look at the facts so far:
The fact (fact #1) that the exportvg worked without any errors indicated that the volume group's file systems were not mounted before running the exportvg. If they had been mounted, we would have got some warnings that the file systems were in use. So a successful exporting actually hinted at a problem.
The fact that the importvg worked indicated that the volume group's disks were available to the OS. So there was no issue with SAN connectivity or some volume group corruption.
Since the volume group didn't come up after the reboot, perhaps the problem was with the volume group itself, rather than any one of the file systems or logical volumes belonging to it.
If the volume group had a problem with it, this seems to have occurred during the migration.
Patterns of Expertise
We looked for a pattern, and noticed that all the missing file systems belonged to the one volume group (datavg). The volume group could be imported / varied on from the command line, but didn't come up after a reboot. Then it twigged: the volume group had somehow been set not to varyon automatically following a reboot.
Sure enough, when we ran lsvg datavg we saw that datavg was set not to auto varyon. Nothing wrong with the volume group. It's just that somewhere along the way, someone, or perhaps a bug, had set it to varyon=no. Easily fixed, as the chvg command explains:
chvg -a y vg03
Which is exactly what we did for datavg.
Getting to the root cause
We rebooted and found the problem was certainly fixed. What caused the volume group not to varyon automatically as part of a reboot? A bit hard to know. Perhaps someone ran the importvg with the -n option, which Causes the volume not to be varied at the completion of the volume group import into the system.
Maybe there was a bug in AIX, where the migration didn't auto-varyon volume groups.
We may never get to the root cause. It's not a big concern, as we've set the varyon correctly now, and everything comes up fine after a reboot.
AnthonyEnglish 270000RKFN Tags:  logical_volume_manager exportvg umount mount lvm varyoffvg 2 Comments 22,539 Views
One of the great strengths of AIX is the Logical Volume Manager (LVM). It may not be completely foolproof, but it does provide quite a lot of protection from administrator errors.
Here are a few common LVM tasks that I've deliberately attempted to make a mess of. See how the LVM reacts
(Don't Try This at Home)
Let's start with three Physical Volumes. rootvg has one PV (booting from SAN) and datavg has two PVs.
hdisk0 00c5a47ecf9edd3e rootvg active
hdisk1 00c5a47ed021ea7d datavg active
hdisk2 00c5a47ef172e84d datavg active
So, let's try to do some damage:
import a volume group that is already active:
# importvg -y datavg hdisk1
0516-360 getvgname: The device name is already used; choose a
0516-776 importvg: Cannot import hdisk1 as datavg.
"Just a moment, please!"
If you try to remove a disk from the operating system Object Data Manager (ODM) while it's still in an active volume
group, you'll get this warning:
rmdev -dl hdisk1
Method error (/etc/methods/ucfgdevice):
0514-062 Cannot perform the requested function because the
specified device is busy.
What about trying to remove an active disk from a volume group?
reducevg datavg hdisk2
0516-016 ldeletepv: Cannot delete physical volume with allocated
partitions. Use either migratepv to move the partitions or
reducevg with the -d option to delete the partitions.
0516-884 reducevg: Unable to remove physical volume hdisk2.
"Sorry, I'm busy!"
Not getting very far, are we? Alright, let's try to export a volume group that's in use:
0516-764 exportvg: The volume group must be varied off
OK, I'll vary it off then:
0516-012 lvaryoffvg: Logical volume must be closed. If the logical volume contains a filesystem, the umount command will close the LV device.
So I have to unmount the file systems in the volume group first. But if they're in use, I'm also protected from doing damage:
umount: 0506-349 Cannot unmount /dev/lvmksysb: The requested resource is busy.
The order of exporting a volume group is:
I can think of many other instances where the LVM protects me from my own stupidity. I don't have to remember to increase a logical volume first before expanding a file system. If I do try to overallocate beyond the MAX Logical Partitions (shown as MAX LPs in the output of the lslv command), I get an error.
Relax - this is AIX!
Good software should protect you (at least a little) from glaringly dumb mistakes. The AIX Logical Volume Manager does that. This can give you a little more confidence when you're breaking the mirror of a logical volume or adding the wrong PV to a volume group. You can still do damage (like running rm /dev/* - but don't say you saw it here), and there are in some cases flags which allow you to force the action, but the very existence of those flags are part of the checks and balances. That means you can be a bit more relaxed than you might have to be on other platforms.
The LVM is another reason why AIX is a true enterprise operating system.
AnthonyEnglish 270000RKFN Tags:  physical_volume varyoffvg recreatevg varyonvg migration volume_group importvg pv file_system disk exportvg aix lun flash_copy 4 Comments 11,315 Views
It's pretty easy to move a volume group from one AIX system to another. You unmount all the file systems from the source volume group, varyoff the VG, export the volume group (exportvg), and then remove the disks from the source system (rmdev -dl hdiskN). Then you assign the LUNs to the target host, import the volume group, mount the file systems, and check permissions.
But what if you want to copy a volume group? You might want to replicate a volume group, by doing a flash copy across the SAN. Then on the remote site, you'd present the SAN LUNs to the target host, run cfgmgr to get the host to see the new disks. The disks on the source host may be named differently on the target host, because the target will just assign the next available hdisk number when you run the cfgmgr. The hdisk numbers may be different between the source and target hosts, but the Physical Volume IDs are the same. After all, the target LUN is a replica of the source LUN.
The problem: duplicate PVIDs and LVs
But that brings up a problem: duplicate PVIDs.
Enter the recreatevg command. Just like the move you can do with importvg and varyonvg,
That overcomes the issue of duplicate PVIDs and Logical Volume IDs.
Now when you run the importvg command yourself, you only specify one physical volume. For example, if the volume group consists of hdisk1, hdisk2 and hdisk3, then the command
importvg -y datavg hdisk2
will import the entire volume group, since the Volume Group Descriptor Area (VGDA) is on all three disks, and all the disks in the volume group know the PVIDs of all the other disks in that volume group.
When you run recreatevg,
The recreatevg command removes all logical volumes that fully or partially exist on the physical volumes that are not specified on the command line. Mirrored logical volumes can be an exception (see the -f flag).
So if you're wondering why the logical volumes on the disks you forgot to mention didn't get included, the LVM will reply: "you never asked." I expect that's because recreatevg (unlike importvg), isn't relying on PVIDs, since it's creating new ones.
Update: Hoarders and chuckers
There are two kinds of people in the world. Some like to keep their old junk, just in case they'll need it some day (they won't!). Others like to toss it out, just in case they don't need it (they will!). The first are the hoarders, the second are those who are chuckers (with apologies to our cultured readership for using such an expression). Well, importvg is a hoarder: you nominate one disk and it assumes you want the others in the volume group. recreatevg, on the other hand, is a chucker.
Perhaps people with more experience using recreatevg will have comments about how this all works in the real world (the recreatevg command, not how to keep the peace between hoarders and chuckers).
AnthonyEnglish 270000RKFN Tags:  aix varyoffvg exportvg varyonvg tunable pending_disk_i/os_blocked... lvm pbuf vmstat pv_pbuf_count unmount nextboot ioo umount importvg volume_group file_system lvmo 20,003 Views
An important warning about importing a VG
I've been playing with LVM tunables, specifically to do with pbufs, to see if changes to the parameters stay with a volume group when it gets moved to a new LPAR.
First, some background
A pbuf is a pinned memory buffer. As this developerWorks article explains, "T
The lvmo command is used to manage pbuf tuning parameters. It allows you to view or set the pbuf count by volume group rather than doing it globally. You can see the number of blocked I/Os for a volume group using the lvmo command. You can also see the global blocked count (total for all volume groups) using vmstat -v, identified as pending disk I/Os blocked with no pbuf.
For now, the question is about setting the pv_pbuf_count.
I was curious. If the pv_pbuf_count (a pbuf count for each physical volume) is set on a volume group basis, where does it get stored? Is it in
Setting the pbuf count
First, I'll change a volume group's pv_pbuf_count from the default value of 512 to 2048 using lvmo:
lvmo -v datavg -o pv_pbuf_count=2048
Now' display the current settings and statistics using
lvmo -v datavg -a
vgname = datavgThe new pv_pbuf_count is set to 2048. We're allowed 2048 pbufs for each PV in the volume group. The total pbufs for the volume group (total_vg_pbufs) are also 2048. This is because the volume group only has one PV in it. The global_blocked_io_count is set to 59, but that's not from this vg, as the pervg_blocked_io_count (blocked I/Os for this volume group) is set to 0.
This tunable parameter (pv_pbuf_count) survives a reboot, so where is this parameter change recorded? In /etc/tunables/nextboot? No, that file was unchanged. (If we'd used the old, system-wide way of changing the pbufs using ioo, we'd see it in nextboot, but that would change it for all volume groups, not just for the one we want).
So is the setting in the VGDA? I'll export the volume group using exportvg and import it and see what happens.
Ordinarily, you'd be doing the export from one LPAR, map the LUN to another LPAR and then import the volume group there, but doing the export and import on the same LPAR will prove the point for this exercise.
Before exporting the volume group, I need to unmount any file systems in it. You can list file systems in a volume group using the lsvgfs command. Having done that, you can deactivate and export the volume group:
And then import it again, to see what happens to our beloved pv_pbuf_count parameter.
Now, let's see what happened to the pv_pbuf_count:
lvmo -v datavg -a
Aha! The export and import has reset the pv_pbuf_count back to the default of 512.
When you do a volume group export and import - a great way of moving all of a volume group's data to a new location, rather than copying it or restoring it - the logical volumes and file systems get moved across to the target system, but tuning parameters don't come for the ride.
AnthonyEnglish 270000RKFN Tags:  bus varyoffvg exportvg logging copy importvg rsync jfs varyonvg ethernet adapter san lun high virtual speed data nolog 3 Comments 13,871 Views
"Go live or go home"
If you need to get a lot of data from one AIX host to another, there's more than one way to do it quickly. During a migration to a new system, you may do a database dump or archive and want to get the data available to the new LPAR within a short time frame. You can
Come to think of it, the ones I'm about to suggest may not be suitable for your config, either, but there may be some occasions when these methods will save you big time. That can be the difference between going live and going home defeated.
Getting the files there faster
Here are three ways of speeding up one-off data transfers. Which one you use will depend on your needs (e.g. move or copy data?, two LPARs on the same server?, etc.).
Whether you use one of these, or something else such as rsync, will depend on your configuration and your needs.
The fine print
Here's a look at each of the methods in a little more detail.
There are other methods you could use. It all depends on your needs and the system configuration.
We're now on Twitter!
If you want to follow AIXDownUnder on Twitter, click on the Twitter widget on the right of this blog.
AnthonyEnglish 270000RKFN Tags:  mount_point mksysb logical_volume lv importvg chfs volume_group chlv rootvg gentleman exclude.rootvg exportvg tsm aix manners file_system 19,492 Views
Diamond in the SAN
I recently cloned an LPAR from a P6 to a less-busy P5 server via a mksysb backup. Unfortunately, after starting up the new LPAR I found that there was an essential directory missing because it had been excluded via the /etc/exclude.rootvg file (-e option on mksysb). Let's call the directory /diamond (its name has been changed to protect the guilty who didn't back it up).
I considered my options but each of them had a drawback.
Option 1: restore directory contents from TSM
I went with option 4, which allowed me to dig out the treasured directory's contents from the SAN. It did have the drawback that it would rename logical volumes on the original SAN LUN. I was willing to take the risk. It wasn't likely I would need to boot off that LUN again. Even if I did have to, I could always rename them via SMS in single user mode before mounting file systems.
I had to get the old rootvg onto a live AIX LPAR. Here's how I did it.
An import-ant command
After mapping the old SAN LUN to the new AIX LPAR, I used importvg to import the old LPAR's rootvg under another name - oldrootvg
newhost:/ # importvg -y oldrootvg hdisk3I got some warnings which I had to deal with before getting access to the contents of the directory I needed. The first was to do with LV names, the second was mount points. Here are the details,
LVM's good manners
When you're importing a rootvg (under a different name) onto a running LPAR with its own active rootvg, there are bound to be duplicate file systems and logical volumes. Fortunately, the AIX Logical Volume Manager (LVM) shows a bit of respect - the active LPAR's file systems and LVs don't get overwritten. The new guest LUN isn't allowed to push in. Instead, if there any conflicts, the imported VG gets new LV names. So when I ran importvg, hd4 was renamed to fslv02:
0516-530 synclvodm: Logical volume name hd4 changed to fslv02.
0516-530 synclvodm: Logical volume name hd2 changed to fslv03.
0516-530 synclvodm: Logical volume name hd9var changed to fslv04.
0516-530 synclvodm: Logical volume name hd3 changed to fslv05.
0516-530 synclvodm: Logical volume name hd1 changed to fslv06.
0516-530 synclvodm: Logical volume name hd10opt changed to fslv07.
Here's where the LVM showed itself to be a perfect gentleman. Good guests don't push people out of their own homes.
As for file systems, I got a warning about duplicate mount points that needed to be fixed before mounting the old rootvg's file systems:
imfs: Warning: mount point / already exists in /etc/filesystems.The logical volume for the file system I needed had been renamed from hd4 to fslv02, but the mount point was still set to /, so I had to change that for the newly named logical volume:
imfs: Warning: mount point /usr already exists in /etc/filesystems.
imfs: Warning: mount point /var already exists in /etc/filesystems.
imfs: Warning: mount point /tmp already exists in /etc/filesystems.
imfs: Warning: mount point /home already exists in /etc/filesystems.
imfs: Warning: mount point /opt already exists in /etc/filesystems.
chlv -L'/mnt/oldroot' fslv02At this point I created a new jfs2 log in smit lv (type jfs2log). I then set the jfs2log for my "new" root file system:
chfs -a log=/dev/loglv02 /mnt/oldroot
Finding that mount
The mount point /mnt/oldroot didn't exist, so I created the directory.
mkdir -p /mnt/oldrootI could then mount the file system /mnt/oldroot, and recover the contents of /mnt/oldroot/diamond.
I then cleaned up the oldrootvg using umount, varyoffvg and exportvg and rmdev of the disk. I then unmapped the SAN LUN which was no longer needed on the new host and was able to start the database.
I fixed the /etc/exclude.rootvg (and the TSM inclexcl file, just to be safe) and gave a quiet thanks that when importing the rootvg into an existing running LPAR, the LVM had acted as a perfect gentleman.