- After a reboot, exportvg worked without any errors
- importvg then worked and made the missing file systems available
- when rebooting, the problem reappeared
- the VM had been rebooted recently, prior to the AIX migration, and there were no problems.
AIX Down Under
AnthonyEnglish 270000RKFN Tags:  importvg volume_group vg varyonvg exportvg migration 3 Comments 15,244 Views
A customer did a migration from AIX 5.3 to 6.1 and then called me to report a strange set of symptoms. Some file systems didn't mount following a reboot. When the file systems in the volume group (let's call it datavg) went to mount, they returned the error that there was no such device. If the customer ran an exportvg and an importvg, all the datavg file systems became available. But then another reboot was done and the datavg file systems didn't mount.
Update: I have an idea of what might have happened. The customer (as I should have mentioned) did the migration to AIX 6.1 after restoring from an AIX 5.3 mksysb. I suspect the bosinst.data file had the option to Import user volume groups (such as datavg) set to No. The system had been migrated from 5.3 to 7.1, then when they ran into difficulties, they chose to restore the 5.3 mksysb and then upgrade just to 6.1. Perhaps I should have mentioned that background in the original post. Pretty poor omission if this was a detective story that you had paid hard cold cash for. But it isn't. And you didn't. So read on.
As I only had support over the phone, I had to do some detective work without being able to run any commands or look at screens for myself. (Technology is still pretty primitive down here in the antipodes).
Nested File System - not the culprite
I immediately thought it was a case of nested file systems that were attempting to mount before their parent file systems. This typically happens when the sub-file system (e.g. /tsm/log) appears in the file /etc/filesystems before the parent file system (/tsm). As /tsm hasn't been mounted, there is no mount point directory called /tsm/log, and so the /tsm/log file system fails to mount. This is usually as a result of someone manually editing /etc/filesystems.
As the customer pointed out to me, the missing device was the logical volume /dev/fslv00, not the file system's mount point directory. So I buried my nested file systems theory to be pulled out for next time a customer rings me with missing file systems.
Just, the facts, Sir
With my nested file systems theory soundly demolished, it was time to take a good, hard look at the facts so far:
The fact (fact #1) that the exportvg worked without any errors indicated that the volume group's file systems were not mounted before running the exportvg. If they had been mounted, we would have got some warnings that the file systems were in use. So a successful exporting actually hinted at a problem.
The fact that the importvg worked indicated that the volume group's disks were available to the OS. So there was no issue with SAN connectivity or some volume group corruption.
Since the volume group didn't come up after the reboot, perhaps the problem was with the volume group itself, rather than any one of the file systems or logical volumes belonging to it.
If the volume group had a problem with it, this seems to have occurred during the migration.
Patterns of Expertise
We looked for a pattern, and noticed that all the missing file systems belonged to the one volume group (datavg). The volume group could be imported / varied on from the command line, but didn't come up after a reboot. Then it twigged: the volume group had somehow been set not to varyon automatically following a reboot.
Sure enough, when we ran lsvg datavg we saw that datavg was set not to auto varyon. Nothing wrong with the volume group. It's just that somewhere along the way, someone, or perhaps a bug, had set it to varyon=no. Easily fixed, as the chvg command explains:
chvg -a y vg03
Which is exactly what we did for datavg.
Getting to the root cause
We rebooted and found the problem was certainly fixed. What caused the volume group not to varyon automatically as part of a reboot? A bit hard to know. Perhaps someone ran the importvg with the -n option, which Causes the volume not to be varied at the completion of the volume group import into the system.
Maybe there was a bug in AIX, where the migration didn't auto-varyon volume groups.
We may never get to the root cause. It's not a big concern, as we've set the varyon correctly now, and everything comes up fine after a reboot.
AnthonyEnglish 270000RKFN Tags:  physical_volume varyoffvg recreatevg varyonvg migration volume_group importvg pv file_system disk exportvg aix lun flash_copy 4 Comments 11,608 Views
It's pretty easy to move a volume group from one AIX system to another. You unmount all the file systems from the source volume group, varyoff the VG, export the volume group (exportvg), and then remove the disks from the source system (rmdev -dl hdiskN). Then you assign the LUNs to the target host, import the volume group, mount the file systems, and check permissions.
But what if you want to copy a volume group? You might want to replicate a volume group, by doing a flash copy across the SAN. Then on the remote site, you'd present the SAN LUNs to the target host, run cfgmgr to get the host to see the new disks. The disks on the source host may be named differently on the target host, because the target will just assign the next available hdisk number when you run the cfgmgr. The hdisk numbers may be different between the source and target hosts, but the Physical Volume IDs are the same. After all, the target LUN is a replica of the source LUN.
The problem: duplicate PVIDs and LVs
But that brings up a problem: duplicate PVIDs.
Enter the recreatevg command. Just like the move you can do with importvg and varyonvg,
That overcomes the issue of duplicate PVIDs and Logical Volume IDs.
Now when you run the importvg command yourself, you only specify one physical volume. For example, if the volume group consists of hdisk1, hdisk2 and hdisk3, then the command
importvg -y datavg hdisk2
will import the entire volume group, since the Volume Group Descriptor Area (VGDA) is on all three disks, and all the disks in the volume group know the PVIDs of all the other disks in that volume group.
When you run recreatevg,
The recreatevg command removes all logical volumes that fully or partially exist on the physical volumes that are not specified on the command line. Mirrored logical volumes can be an exception (see the -f flag).
So if you're wondering why the logical volumes on the disks you forgot to mention didn't get included, the LVM will reply: "you never asked." I expect that's because recreatevg (unlike importvg), isn't relying on PVIDs, since it's creating new ones.
Update: Hoarders and chuckers
There are two kinds of people in the world. Some like to keep their old junk, just in case they'll need it some day (they won't!). Others like to toss it out, just in case they don't need it (they will!). The first are the hoarders, the second are those who are chuckers (with apologies to our cultured readership for using such an expression). Well, importvg is a hoarder: you nominate one disk and it assumes you want the others in the volume group. recreatevg, on the other hand, is a chucker.
Perhaps people with more experience using recreatevg will have comments about how this all works in the real world (the recreatevg command, not how to keep the peace between hoarders and chuckers).
AnthonyEnglish 270000RKFN Tags:  aix exportvg varyoffvg varyonvg tunable pending_disk_i/os_blocked... lvm pbuf vmstat pv_pbuf_count unmount nextboot ioo umount importvg volume_group lvmo file_system 20,312 Views
An important warning about importing a VG
I've been playing with LVM tunables, specifically to do with pbufs, to see if changes to the parameters stay with a volume group when it gets moved to a new LPAR.
First, some background
A pbuf is a pinned memory buffer. As this developerWorks article explains, "T
The lvmo command is used to manage pbuf tuning parameters. It allows you to view or set the pbuf count by volume group rather than doing it globally. You can see the number of blocked I/Os for a volume group using the lvmo command. You can also see the global blocked count (total for all volume groups) using vmstat -v, identified as pending disk I/Os blocked with no pbuf.
For now, the question is about setting the pv_pbuf_count.
I was curious. If the pv_pbuf_count (a pbuf count for each physical volume) is set on a volume group basis, where does it get stored? Is it in
Setting the pbuf count
First, I'll change a volume group's pv_pbuf_count from the default value of 512 to 2048 using lvmo:
lvmo -v datavg -o pv_pbuf_count=2048
Now' display the current settings and statistics using
lvmo -v datavg -a
vgname = datavgThe new pv_pbuf_count is set to 2048. We're allowed 2048 pbufs for each PV in the volume group. The total pbufs for the volume group (total_vg_pbufs) are also 2048. This is because the volume group only has one PV in it. The global_blocked_io_count is set to 59, but that's not from this vg, as the pervg_blocked_io_count (blocked I/Os for this volume group) is set to 0.
This tunable parameter (pv_pbuf_count) survives a reboot, so where is this parameter change recorded? In /etc/tunables/nextboot? No, that file was unchanged. (If we'd used the old, system-wide way of changing the pbufs using ioo, we'd see it in nextboot, but that would change it for all volume groups, not just for the one we want).
So is the setting in the VGDA? I'll export the volume group using exportvg and import it and see what happens.
Ordinarily, you'd be doing the export from one LPAR, map the LUN to another LPAR and then import the volume group there, but doing the export and import on the same LPAR will prove the point for this exercise.
Before exporting the volume group, I need to unmount any file systems in it. You can list file systems in a volume group using the lsvgfs command. Having done that, you can deactivate and export the volume group:
And then import it again, to see what happens to our beloved pv_pbuf_count parameter.
Now, let's see what happened to the pv_pbuf_count:
lvmo -v datavg -a
Aha! The export and import has reset the pv_pbuf_count back to the default of 512.
When you do a volume group export and import - a great way of moving all of a volume group's data to a new location, rather than copying it or restoring it - the logical volumes and file systems get moved across to the target system, but tuning parameters don't come for the ride.
AnthonyEnglish 270000RKFN Tags:  syncvg san lvm disk synchronise varyonvg logical_volume_manager reducevg mirrrovg storage extendvg lv aix mirror smit unmirrorvg pavlova migratepv lslv mklvcopy lun lsvg smitty vg logical_volume rmlvcopy volume_group 3 Comments 26,911 Views
SERVING UP LVs, PVs and PAVs
Since we've got redundant arrays on SANs these days, it may seem almost quaint to speak about software mirroring using the AIX Logical Volume Manager. Even so, LVM is very useful when you want to move data around. If you need to move to a new storage subsystem or just to a new LUN, and you're not able to do it on the backend, the LVM may be just the ticket.
For example, supposing you are using a LUN that's a whole lot bigger than you need. There might be a lot of reasons how it came to that but the most common one is you may have slightly overestimated the amount of disk you needed when you first went with your begging bowl to the storage team. Admit it. You asked for thirty times more disk than you needed, just in case. And the reason you did that is because you never listened to your mum when you couldn't finish your pavlova ("pav" in Aussie-land). Don't you remember her telling you:
"Your eyes are bigger than your stomach"
Well now's your chance to
Make IT history!
and hand back some storage you don't need.
LV to new PV in same VG
First you allocate a new, leaner (smaller!) LUN to AIX then add it to the volume group using extendvg. (You may need to change its queue depth, preferred path, health check interval etc). Once it's a member of the VG, you can mirror at the logical volume level using mklvcopy.
You can mirror a whole volume group, (mirrorvg), and that's really the best way to do it with rootvg, because it has boot disks and dump devices which need special treatment. For other volume groups I often use mklvcopy because it allows me to mirror one logical volume at a time. You don't need to synchronise the two copies immediately (using the -k flag), but until you do, the lslv command (and the lsvg command) will show some partitions are stale. You can create the copies and wait for a quieter time to run the synchronise. It's a lot faster if the disks aren't busy, but it's perfectly legitimate to synchronise them while they're in use.
If you want to synchronise the two copies, use syncvg. You can synchronise the whole volume group using
syncvg -v VGNAME
The varyonvg command (which activates a volume group) will do the same thing, and you can run that command - varyonvg - even if the volume group is already active. With both varyonvg and syncvg, if there are no stale partitions to be synchronised, the shell prompt will come back in a jif.
If you want to synchronise a single logical volume, use
syncvg -l LVNAME
Not seven years' bad luck
Once you've synchronised the mirror to the new LUN, you can break the mirror to the old one, by using
rmlvcopy LVNAME 1 hdiskN
And you'll be stopped from running rmlvcopy if there will be only stale partitions left afterwards; you're not allowed to remove the last good copy of a physical partition. That's nice, isn't it? You also get a big warning if you try to remove a pv from a volume group when it still has any partitions on it.
The oft-quoted Chris Gibson has an article showing how he migrated to a new SAN using LVM. The same principle applies for a single LUN. There is also the migratepv command which is a simple way of moving everything off one pv to another. As with mirrorvg and mklvcopy, the target pv has to be a member of the volume group first.
These commands are so much fun that it's a shame to use SMIT, but you can do it that way if you want to.
Shock the SAN team
Once you've run rmlvcopy (or unmirrorvg which will remove your seven years of bad luck), you can remove the PV from the volume group (reducevg) before taking it out of the ODM using
rmdev -dl hdiskN
If you've removed the old giant LUN from the configuration all along the way (VG, ODM, VIO server), you can hand it back to your SAN team for them to recycle. Once they realise you're not playing some sort of practical joke, they'll be grateful. Shocked, but grateful. They may need the disk for someone else who didn't listen to his mum when she served up the pav.
AnthonyEnglish 270000RKFN Tags:  bus varyoffvg exportvg logging copy importvg rsync jfs varyonvg ethernet adapter san lun high virtual speed data nolog 3 Comments 14,196 Views
"Go live or go home"
If you need to get a lot of data from one AIX host to another, there's more than one way to do it quickly. During a migration to a new system, you may do a database dump or archive and want to get the data available to the new LPAR within a short time frame. You can
Come to think of it, the ones I'm about to suggest may not be suitable for your config, either, but there may be some occasions when these methods will save you big time. That can be the difference between going live and going home defeated.
Getting the files there faster
Here are three ways of speeding up one-off data transfers. Which one you use will depend on your needs (e.g. move or copy data?, two LPARs on the same server?, etc.).
Whether you use one of these, or something else such as rsync, will depend on your configuration and your needs.
The fine print
Here's a look at each of the methods in a little more detail.
There are other methods you could use. It all depends on your needs and the system configuration.
We're now on Twitter!
If you want to follow AIXDownUnder on Twitter, click on the Twitter widget on the right of this blog.
AnthonyEnglish 270000RKFN Tags:  os original_state dvd soppy_greeting_card crontab camel_trainer mkcd varyonvg volume_groups mksysb delete rm recover mkdvd disaster rootvg install 1 Comment 13,576 Views
"Clean up the system"
You have suddenly been appointed as AIX administrator after the previous admin disappeared in mysterious circumstances. You've logged onto the AIX system once before.
rm -rf *rm commandWikipedia entry for "rm"crontab -e
Maybe you will restore mksysb from tape or restore from a DVD-RAM. You could use mkdvd to convert a mksysb file to ISO. If the mksysb was created in ISO format, you could install the ISO image via the VIO server.
Time to exhale
Soon enough you'll have a rootvg restored, a login prompt and a system ready to configure network communications if you need to. There'll be files which didn't get restored in the mksysb, such as those that are excluded via the -e flag in your mksysb / mkcd / mkdvd. You might have some data volume groups to resurrect using importvg and varyonvg, and then some files and databases to restore. But all in all, you've replaced a nice "clean" system with one that's going to work again.
You have learned that prevention is better than cure.