Brian Smith's AIX / UNIX / Linux / Open Source blog
AIX/VIO: Tracing Virtual SCSI / Shared Storage Pool Disks on AIX to VIO resources (and a script to automate this)Modified on by brian_s
Here is a method you can use to reset a lost VIO padmin password from the HMC with zero downtime on the VIO server. This is a somewhat involved process, but much easier than having to take a downtime on the VIO server to change the password. This is a very challenging task because the viosvrcmd HMC command doesn't allow the command run on the VIO server to have a pipe ("|"), or any redirection ("<", ">") and doesn't allow for interactive input. So this rules out using something like "chpasswd" to change the password.
Step 1: Find the current padmin password hash. From the HMC, type (change "-m p520 -p vio1" to your managed system / VIO server names)
Look for the padmin stanza and its password hash:
Step 2: Generate a new password hash. From a different AIX server that has openssh/openssl installed, type "openssl passwd" and type in the new password that you want to assign to the padmin account. Openssl will generate the password hash and display it on the screen.
# openssl passwd
Step 3: Replace the VIO padmin's password hash with the new password hash from the HMC using viosvrcmd/perl. Use a command similiar to this from the HMC:
command=`printf "oem_setup_env\nperl -pi -e 's/<OLD_HASH>/<NEW_HASH>/' /etc/security/passwd"`; viosvrcmd -m p520 -p vio1 -c "$command"
In our example, it would be (make sure to change "-m p520 -p vio1" to your managed system / VIO names)
Step 4: Optionally reset padmin failed login count. If you need to reset the failed login count, run this command from the HMC: (make sure to change "-m p520 -p vio1" to your managed system / VIO names)
command=`printf "oem_setup_env\nchsec -f /etc/security/lastlog -a unsuccessful_login_count=0 -s padmin"`; viosvrcmd -m p520 -p vio1 -c "$command"
Update 3/23/13 - If the old or new password hash has a slash in it ("/") then the perl line above needs to be changed.. Instead use a different delimiter such as a comma: command=`printf "oem_setup_env\nperl -pi -e 's,<OLD_HASH>,<NEW_HASH>,' /etc/security/passwd"`; viosvrcmd -m p520 -p vio1 -c "$command"
Over my career as an AIX administrator I have run in to problems with the "Maximum Virtual Adapters" LPAR profile setting many times. This setting controls the highest virtual slot number that you can use on an VIO or AIX LPAR. It is not dynamically changeable and requires a reboot to modify.
By default when you create a VIO server the Maximum Virtual Adapters is set to only 20. Any time you are creating a VIO server I recommend setting this setting much, much higher. The higher you set this to the more memory overhead, so don't go crazy. Determine how many slots it will take to support the number of LPAR's you plan on having on the system, and then at least double that number to come up with what you want to set this to. Also @nixysug pointed out that if you have the maximum virtual adapters set to above 1,000 you might have issues with live partition mobility (for more info see http://nixys.fr/blog/?p=214)
Lets suppose you have a dual VIO system that was setup with the default Maximum adapters set to 20. You are using either Virtual SCSI or Virtual Fibre Channel (doesn't really matter which one as the slots work basically the same way). You have come up with a slot numbering convention where slots 10 and under are used for Virtual Ethernet. For the VSCSI/VFC slots, your convention is to start at slot 11 and have 2 slots per AIX LPAR. One slot will go to VIO1 and the other slot to VIO2. You decide to have the odd number slots go to VIO1 and the even number slots go to VIO2. This is a very common slot numbering convention and is often recommended/used in IBM documentation like Redbooks.
This works fine until you get an urgent request to add an additional LPAR to the system. You go to add the "p6_aix5" LPAR and decide to use slot 19 to VIO1 and slot 20 to VIO2. When you go to add the slot 20 to VIO2 you'll get a message like this:
The system will not allow you to create any slots equal to or greater than whatever the maximum virtual adapters setting is set to. So if you have a maximum virtual adapters to the default, 20, then you will only be able to use slots numbered 19 or below.
As previously mentioned the only way to change this maximum virtual adapters setting is to change the VIO server profile, shut the VIO server all the way down, and then re-activate it. Depending on your environment having a downtime on the VIO server might be a big deal, even if you have dual VIO servers.
If you have a urgent request to build a new LPAR, here are your options at this point:
I have found that there is a lot of confusion out there about virtual slot numbers on POWER servers. For the first several years I worked on these servers I had a lot of incorrect assumptions about how the slots worked.
A lot of people are under the incorrect impression that the client and server adapter numbers need to match. For example, some people think that if you create an adapter with slot 11 on the LPAR that the slot on the VIO server also needs to be slot 11. This is totally incorrect, and you can in fact have VSCSI client slot 11 connect to VSCSI VIO server slot 16 with no problems at all.
Another common misconception is that the slot numbers are "global" across the entire system and must be unique across the entire system. In fact, it is possible to setup multiple LPAR's that all use the same virtual slot number. For example, you could have 100 LPAR's and have every one of them use virtual slot numbers "4" and "5" for their virtual SCSI client adapters. Here is an example of a slot numbering convention where this is done (notice each LPAR uses slot 4 and 5 for the client adapters):
Back to our original issue where we are trying to create a new LPAR and are getting "The maximum number of virtual slots must be greater than the highest slot number" error message.
Let's suppose we want to build 2 additional LPAR's and not have a VIO downtime.
One method we could use to build these 2 new LPAR's right away and avoid a VIO downtime is to break our original slot numbering convention. For the other LPAR's we had used slots "11" and "13" on VIO1 and slots "12" and "14" on VIO2. Because these slot numbers do not need to be globally unique across the system there is nothing from stopping us from using slots "11" and "13" on VIO2 and slots "12" and "14" on VIO1 (flip-flopped from how the other LPAR's had been setup).
So the solution is to build the 2 new LPAR's like this:
p6_aix5 client slot 11 mapped to VIO2 slot 11.
p6_aix5 client slot 12 mapped to VIO1 slot 12.
p6_aix6 client slot 13 mapped to VIO2 slot 13.
p6_aix6 client slot 14 mapped to VIO1 slot 14.
Below is a diagram of how this setup would look. Note that slots 11 are used by p6_aix5 and p6vio2. Slots 11 are also used by p6_aix1 and p6vio1; but as discussed this isn't a problem.
So if you ever run in to the this error message relating to the maximum virtual adapter setting... Don't panic. If you can't reboot the VIO server to change it you still have options. You might just need to get creative and think outside the box and be willing to break/change your slot numbering convention.
A tool that might help you out with understanding and validating your Virtual Slot configuration is the "pslot" tool. This is a Perl program I wrote that will visualize and validate your virtual slots. It was used to create the diagrams in this article. It is free and open source, and can be downloaded from http://pslot.sourceforge.net/
The HMC provides the "viosvrcmd" command to run VIO commands from the HMC.
Here are a couple of reasons you might want to do this:
Here is an example:
hscroot@hmc1:~> viosvrcmd -m p520 -p vio1 -c "ioslevel"
The man page points out that the "command cannot contain the semicolon (;), greater than (>), or vertical bar (|) characters."
However, one thing you might frequently need to do is run a command and grep for a certain string. If you try this directly with the viosvrcmd command line it won't work due to the pipe character:
hscroot@hmc1:~> viosvrcmd -m p520 -p vio1 -c "lsmap -all | grep vhost3"
However a easy workaround to this is to put the pipe and grep after the viosvrcmd line so that the pipe is run on the HMC and not on the VIO. The end result is the same but you avoid putting a pipe in the command to be run on the VIO server so it works:
hscroot@hmc1:~> viosvrcmd -m p520 -p vio1 -c "lsmap -all" | grep vhost3
Notice the the difference in where the closing quotation mark is; this makes all the difference. Before the pipe was in the quotes so it would have been sent to the VIO server which isn't allowed. In the next example the pipe is after the quote so it is run on the HMC server.
Running oem_setup_env commands through viosvrcmd
hscroot@hmc1:~> viosvrcmd -m p520 -p vio1 -c "oem_setup_env
I did some experimenting with this and also found you can do something like this which might be a little easier if you are scripting and want to make it easier to put the newline in the command:
hscroot@hmc1:~> command=`printf "oem_setup_env\nwhoami"`
My new custom homemade AIX drink coaster:
My kids made cool Halloween characters, and of course I tried to think of something geeky to make.. Here is what I came up with:
One of the drawbacks of using VIO VSCSI to map SAN LUN's to LPAR's is the time it takes to map the disks through on the VIO servers. The LUN's must be created on the SAN and allocated to the VIO server. On the VIO server you must then map each hdisk to the LPAR. If you are using dual VIO servers it is even worse, not only must you map the LUN through on both VIO servers, but there is a good chance that the hdisk and vhost devices are numbered differently on each VIO server so the mapping commands could be different on each VIO server.
If you have a bunch of servers and LUN's to map, it can really be time consuming and a huge headache to map all of these correctly. The mappings are critically important to get right.
Here is a diagram that shows an example of the mapping commands needed to map a single LUN in a dual VIO server environment:
Below is a Perl script that will make this task considerably easier by generating these mapping commands automatically for you. It generates the commands to map the LUN's from the VIO servers to the LPAR's. You must already have the VSCSI server/client adapters set up for the script to work.
The script is run from a server that has SSH keys setup to one of the HMC's that manages the system. When you run the script, you provide 3 arguments:
For example, you could run it as: ./vio_vscsi_map.pl hmc01 p520 mappings.txt
The input file (in this example mappings.txt) has 3 space separated columns that contain LUN information (1 LUN per line):
Here is an example input file (mappings.txt) showing I want to map 4 new LUN's to lpar05 and 3 new LUN's to lpar06:
That is all the information you need to provide the script. When the script runs it will find the VIO servers, find the correct vhost adapter that is associated with the LPAR on each VIO server, and find the correct hdisk for the LUN on each VIO server. The script itself doesn't make any changes on the system, it will simply display on the screen the commands that need to be run on each of the VIO servers. You can then simply copy and paste the command lines in to the VIO servers.
It is possible (but uncommon) to have multiple vhost - vscsi mappings between a VIO server and the same LPAR client. In this case, the script doesn't know which vhost adapter to use so it will print out a message that it detected multiple vhost/vscsi mappings for the server and display the commands to map for each vhost adapter. You can then copy/paste the line for the one vhost adapter you would actually like to use.
When the script runs, it can take several minutes or longer depending on how many hdisks your VIO server has. This is because it is looking through each hdisk on each VIO server trying to find the disk that matches the LUN serial number from the input file.
If the script is unable to find a hdisk on the VIO server with a matching LUN serial number it goes on to the next line from the input file and starts looking for the next one.
Here is the output of the script:
Note that the output has some extra information displayed (VIO Slot, Client Slot, Client Name). This is for informational purposes only and since it is after a pound symbol (#) this part of the line is treated as a comment.
Again, the script itself doesn't make any changes, it simply gathers all of the information needed to produce the command lines that you would need to run.
Shutting down a frame with a lot of servers on it can be scary. Often times changes have been "DLPAR'ed" in to LPAR's/VIO's but the LPAR profile wasn't also updated. So the next time the LPAR is shut down, these changes are lost. This can be especially bad if you are talking about virtual adapters that have been DLPAR'ed in to VIO servers or clients. You really don't want to loose your NPIV virtual FCS adapters and their WWPN's!
Here is a command line you can run from the HMC that for any given managed system will save every LPAR's running configuration to its current profile. You might want to consider running something like this before you shutdown a frame if you are not sure if the running configurations are out of sync with the profiles. Just update the system="p520" part with the name of one of your managed systems.
system="p520"; for lpar in `lssyscfg -m $system -r lpar -F "name,state" | grep ",Running$" | cut -d, -f 1`; do echo Saving running config to profile for $lpar; mksyscfg -r prof -m $system -o save -p $lpar -n `lssyscfg -r lpar -m $system --filter lpar_names=$lpar -F curr_profile` --force; done
If you are anything like me, there has been several times over the years where you have said "Argh! If only I could make a VIO server a VSCSI client of another VIO server!"
There are a couple of scenario's where this could be particularly useful:
But as you probably know in the HMC GUI it won't allow you to create a "client" adapter in a VIO partition, and it won't allow you to setup a "server" adapter with a VIO partition as a client.
I discovered if you use the HMC command line interface to add the VSCSI adapters, it will actually let you setup a VIO server as a VSCSI client of another VIO server, and even more surprisingly it actually seems to work!
This VIO server has no physical resources and is booting over a virtual scsi disk provided by another VIO server. It also has a virtual optical CD drive served by the other VIO server.
I did run in to issues when trying to virtualize already virtualized resources. For example, it doesn't work to have VIO1(Real resources) serving VSCSI disk to VIO2(all virtual), and the have VIO2 share its already virtual disk resources to an AIX partition (basically disk assigned through VIO1->VIO2->AIX). The VIO2 errpt logged disk/LVM errors and the AIX server had issues with the disk.
This was tested with HMC 126.96.36.199 (an older version..) I'm not sure if it will still work with newer versions of the HMC. Obviously this isn't going to be supported by IBM so only play around with this if you know what you are doing and you are in a TEST environment!
Both the "chsyscfg" (for the profile) and "chhwres" (to DLPAR) worked from this version of the HMC to add a client VSCSI adapter to a VIO server and to add a server VSCSI adapter with a VIO as a client.
If you try it out, post a comment on the blog and let me know what your experience is with it. I just discovered this tonight and haven't done very much testing at this point, but I can already see several scenarios where this could be very useful.
I recently got a question about how to script disabling paths for a Virtual SCSI adapter so you can prepare to take a VIO server down for maintenance.
Of course you can just take down the VIO server and the Virtual SCSI will failover to the other path, however it is generally more graceful and less impactful to use the chpath command to disable the path by lowering it's priority.
Lets suppose we want to take down the "VIO2" server which on the AIX client maps back to vscsi1. If you are not sure which vscsi adapters map to which VIO servers, then check out my previous posting/script: AIX/VIO: Tracing Virtual SCSI / Shared Storage Pool Disks on AIX to VIO resources which will produce a detailed report of how everything related to vscsi maps out.
So in this scenario we want to disable all the paths on "vscsi1" on an AIX client (you would need to do this same procedure on each AIX client of the VIO server). But first, we want to record what the current settings are so we can restore them back after the maintenance.
If we run this one liner, it will show the CURRENT path priority settings for everything on "vscsi1" on our AIX client in the form of command lines to set the priorities. We can save this off and simply run these commands after the VIO maintenance to restore everything like it was originally.
The output will look something like this. Again, note that this command doesn't actually change anything - it just shows you the command lines needed to get back to your original settings.
Now, we can run this command line to set the priority to 255 for all vscsi1 paths which should cause the traffic to switchover to the other vscsi path.
The output will look something like this:
After the VIO maintenance is complete, you can simply run the commands you had previously recorded and it will restore your original VSCSI path priority settings.
I've updated the pslot script to support validating and visualizing Virtual Fibre Channel / NPIV slots in addition to VSCSI slots. For more info and to download the updated script go to the project homepage at: http://pslot.sourceforge.net/
Here is a screenshot of the diagram generated in VFC mode:
Here is a quick HMC one line script to run a command on every VIO server attached to the HMC. It uses "viosvrcmd" which uses RMC so there is no SSH keys or anything to setup; it just works as long as RMC is working in your environment.
In this example, it runs the "errlog" (aka errpt) command on every VIO server. You can change this to whatever command you would like to run. The script looks through all managed systems for VIO partitions, and attempts to use viosvrcmd to run the specified command on each VIO server.