Brian Smith's AIX / UNIX / Linux / Open Source blog
AIX/VIO: Tracing Virtual SCSI / Shared Storage Pool Disks on AIX to VIO resources (and a script to automate this)Modified on by brian_s
Here is a method you can use to reset a lost VIO padmin password from the HMC with zero downtime on the VIO server. This is a somewhat involved process, but much easier than having to take a downtime on the VIO server to change the password. This is a very challenging task because the viosvrcmd HMC command doesn't allow the command run on the VIO server to have a pipe ("|"), or any redirection ("<", ">") and doesn't allow for interactive input. So this rules out using something like "chpasswd" to change the password.
Step 1: Find the current padmin password hash. From the HMC, type (change "-m p520 -p vio1" to your managed system / VIO server names)
Look for the padmin stanza and its password hash:
Step 2: Generate a new password hash. From a different AIX server that has openssh/openssl installed, type "openssl passwd" and type in the new password that you want to assign to the padmin account. Openssl will generate the password hash and display it on the screen.
# openssl passwd
Step 3: Replace the VIO padmin's password hash with the new password hash from the HMC using viosvrcmd/perl. Use a command similiar to this from the HMC:
command=`printf "oem_setup_env\nperl -pi -e 's/<OLD_HASH>/<NEW_HASH>/' /etc/security/passwd"`; viosvrcmd -m p520 -p vio1 -c "$command"
In our example, it would be (make sure to change "-m p520 -p vio1" to your managed system / VIO names)
Step 4: Optionally reset padmin failed login count. If you need to reset the failed login count, run this command from the HMC: (make sure to change "-m p520 -p vio1" to your managed system / VIO names)
command=`printf "oem_setup_env\nchsec -f /etc/security/lastlog -a unsuccessful_login_count=0 -s padmin"`; viosvrcmd -m p520 -p vio1 -c "$command"
Update 3/23/13 - If the old or new password hash has a slash in it ("/") then the perl line above needs to be changed.. Instead use a different delimiter such as a comma: command=`printf "oem_setup_env\nperl -pi -e 's,<OLD_HASH>,<NEW_HASH>,' /etc/security/passwd"`; viosvrcmd -m p520 -p vio1 -c "$command"
Over my career as an AIX administrator I have run in to problems with the "Maximum Virtual Adapters" LPAR profile setting many times. This setting controls the highest virtual slot number that you can use on an VIO or AIX LPAR. It is not dynamically changeable and requires a reboot to modify.
By default when you create a VIO server the Maximum Virtual Adapters is set to only 20. Any time you are creating a VIO server I recommend setting this setting much, much higher. The higher you set this to the more memory overhead, so don't go crazy. Determine how many slots it will take to support the number of LPAR's you plan on having on the system, and then at least double that number to come up with what you want to set this to. Also @nixysug pointed out that if you have the maximum virtual adapters set to above 1,000 you might have issues with live partition mobility (for more info see http://nixys.fr/blog/?p=214)
Lets suppose you have a dual VIO system that was setup with the default Maximum adapters set to 20. You are using either Virtual SCSI or Virtual Fibre Channel (doesn't really matter which one as the slots work basically the same way). You have come up with a slot numbering convention where slots 10 and under are used for Virtual Ethernet. For the VSCSI/VFC slots, your convention is to start at slot 11 and have 2 slots per AIX LPAR. One slot will go to VIO1 and the other slot to VIO2. You decide to have the odd number slots go to VIO1 and the even number slots go to VIO2. This is a very common slot numbering convention and is often recommended/used in IBM documentation like Redbooks.
This works fine until you get an urgent request to add an additional LPAR to the system. You go to add the "p6_aix5" LPAR and decide to use slot 19 to VIO1 and slot 20 to VIO2. When you go to add the slot 20 to VIO2 you'll get a message like this:
The system will not allow you to create any slots equal to or greater than whatever the maximum virtual adapters setting is set to. So if you have a maximum virtual adapters to the default, 20, then you will only be able to use slots numbered 19 or below.
As previously mentioned the only way to change this maximum virtual adapters setting is to change the VIO server profile, shut the VIO server all the way down, and then re-activate it. Depending on your environment having a downtime on the VIO server might be a big deal, even if you have dual VIO servers.
If you have a urgent request to build a new LPAR, here are your options at this point:
I have found that there is a lot of confusion out there about virtual slot numbers on POWER servers. For the first several years I worked on these servers I had a lot of incorrect assumptions about how the slots worked.
A lot of people are under the incorrect impression that the client and server adapter numbers need to match. For example, some people think that if you create an adapter with slot 11 on the LPAR that the slot on the VIO server also needs to be slot 11. This is totally incorrect, and you can in fact have VSCSI client slot 11 connect to VSCSI VIO server slot 16 with no problems at all.
Another common misconception is that the slot numbers are "global" across the entire system and must be unique across the entire system. In fact, it is possible to setup multiple LPAR's that all use the same virtual slot number. For example, you could have 100 LPAR's and have every one of them use virtual slot numbers "4" and "5" for their virtual SCSI client adapters. Here is an example of a slot numbering convention where this is done (notice each LPAR uses slot 4 and 5 for the client adapters):
Back to our original issue where we are trying to create a new LPAR and are getting "The maximum number of virtual slots must be greater than the highest slot number" error message.
Let's suppose we want to build 2 additional LPAR's and not have a VIO downtime.
One method we could use to build these 2 new LPAR's right away and avoid a VIO downtime is to break our original slot numbering convention. For the other LPAR's we had used slots "11" and "13" on VIO1 and slots "12" and "14" on VIO2. Because these slot numbers do not need to be globally unique across the system there is nothing from stopping us from using slots "11" and "13" on VIO2 and slots "12" and "14" on VIO1 (flip-flopped from how the other LPAR's had been setup).
So the solution is to build the 2 new LPAR's like this:
p6_aix5 client slot 11 mapped to VIO2 slot 11.
p6_aix5 client slot 12 mapped to VIO1 slot 12.
p6_aix6 client slot 13 mapped to VIO2 slot 13.
p6_aix6 client slot 14 mapped to VIO1 slot 14.
Below is a diagram of how this setup would look. Note that slots 11 are used by p6_aix5 and p6vio2. Slots 11 are also used by p6_aix1 and p6vio1; but as discussed this isn't a problem.
So if you ever run in to the this error message relating to the maximum virtual adapter setting... Don't panic. If you can't reboot the VIO server to change it you still have options. You might just need to get creative and think outside the box and be willing to break/change your slot numbering convention.
A tool that might help you out with understanding and validating your Virtual Slot configuration is the "pslot" tool. This is a Perl program I wrote that will visualize and validate your virtual slots. It was used to create the diagrams in this article. It is free and open source, and can be downloaded from http://pslot.sourceforge.net/
The HMC provides the "viosvrcmd" command to run VIO commands from the HMC.
Here are a couple of reasons you might want to do this:
Here is an example:
hscroot@hmc1:~> viosvrcmd -m p520 -p vio1 -c "ioslevel"
The man page points out that the "command cannot contain the semicolon (;), greater than (>), or vertical bar (|) characters."
However, one thing you might frequently need to do is run a command and grep for a certain string. If you try this directly with the viosvrcmd command line it won't work due to the pipe character:
hscroot@hmc1:~> viosvrcmd -m p520 -p vio1 -c "lsmap -all | grep vhost3"
However a easy workaround to this is to put the pipe and grep after the viosvrcmd line so that the pipe is run on the HMC and not on the VIO. The end result is the same but you avoid putting a pipe in the command to be run on the VIO server so it works:
hscroot@hmc1:~> viosvrcmd -m p520 -p vio1 -c "lsmap -all" | grep vhost3
Notice the the difference in where the closing quotation mark is; this makes all the difference. Before the pipe was in the quotes so it would have been sent to the VIO server which isn't allowed. In the next example the pipe is after the quote so it is run on the HMC server.
Running oem_setup_env commands through viosvrcmd
hscroot@hmc1:~> viosvrcmd -m p520 -p vio1 -c "oem_setup_env
I did some experimenting with this and also found you can do something like this which might be a little easier if you are scripting and want to make it easier to put the newline in the command:
hscroot@hmc1:~> command=`printf "oem_setup_env\nwhoami"`
My new custom homemade AIX drink coaster:
My kids made cool Halloween characters, and of course I tried to think of something geeky to make.. Here is what I came up with:
One of the drawbacks of using VIO VSCSI to map SAN LUN's to LPAR's is the time it takes to map the disks through on the VIO servers. The LUN's must be created on the SAN and allocated to the VIO server. On the VIO server you must then map each hdisk to the LPAR. If you are using dual VIO servers it is even worse, not only must you map the LUN through on both VIO servers, but there is a good chance that the hdisk and vhost devices are numbered differently on each VIO server so the mapping commands could be different on each VIO server.
If you have a bunch of servers and LUN's to map, it can really be time consuming and a huge headache to map all of these correctly. The mappings are critically important to get right.
Here is a diagram that shows an example of the mapping commands needed to map a single LUN in a dual VIO server environment:
Below is a Perl script that will make this task considerably easier by generating these mapping commands automatically for you. It generates the commands to map the LUN's from the VIO servers to the LPAR's. You must already have the VSCSI server/client adapters set up for the script to work.
The script is run from a server that has SSH keys setup to one of the HMC's that manages the system. When you run the script, you provide 3 arguments:
For example, you could run it as: ./vio_vscsi_map.pl hmc01 p520 mappings.txt
The input file (in this example mappings.txt) has 3 space separated columns that contain LUN information (1 LUN per line):
Here is an example input file (mappings.txt) showing I want to map 4 new LUN's to lpar05 and 3 new LUN's to lpar06:
That is all the information you need to provide the script. When the script runs it will find the VIO servers, find the correct vhost adapter that is associated with the LPAR on each VIO server, and find the correct hdisk for the LUN on each VIO server. The script itself doesn't make any changes on the system, it will simply display on the screen the commands that need to be run on each of the VIO servers. You can then simply copy and paste the command lines in to the VIO servers.
It is possible (but uncommon) to have multiple vhost - vscsi mappings between a VIO server and the same LPAR client. In this case, the script doesn't know which vhost adapter to use so it will print out a message that it detected multiple vhost/vscsi mappings for the server and display the commands to map for each vhost adapter. You can then copy/paste the line for the one vhost adapter you would actually like to use.
When the script runs, it can take several minutes or longer depending on how many hdisks your VIO server has. This is because it is looking through each hdisk on each VIO server trying to find the disk that matches the LUN serial number from the input file.
If the script is unable to find a hdisk on the VIO server with a matching LUN serial number it goes on to the next line from the input file and starts looking for the next one.
Here is the output of the script:
Note that the output has some extra information displayed (VIO Slot, Client Slot, Client Name). This is for informational purposes only and since it is after a pound symbol (#) this part of the line is treated as a comment.
Again, the script itself doesn't make any changes, it simply gathers all of the information needed to produce the command lines that you would need to run.
Shutting down a frame with a lot of servers on it can be scary. Often times changes have been "DLPAR'ed" in to LPAR's/VIO's but the LPAR profile wasn't also updated. So the next time the LPAR is shut down, these changes are lost. This can be especially bad if you are talking about virtual adapters that have been DLPAR'ed in to VIO servers or clients. You really don't want to loose your NPIV virtual FCS adapters and their WWPN's!
Here is a command line you can run from the HMC that for any given managed system will save every LPAR's running configuration to its current profile. You might want to consider running something like this before you shutdown a frame if you are not sure if the running configurations are out of sync with the profiles. Just update the system="p520" part with the name of one of your managed systems.
system="p520"; for lpar in `lssyscfg -m $system -r lpar -F "name,state" | grep ",Running$" | cut -d, -f 1`; do echo Saving running config to profile for $lpar; mksyscfg -r prof -m $system -o save -p $lpar -n `lssyscfg -r lpar -m $system --filter lpar_names=$lpar -F curr_profile` --force; done
If you are anything like me, there has been several times over the years where you have said "Argh! If only I could make a VIO server a VSCSI client of another VIO server!"
There are a couple of scenario's where this could be particularly useful:
But as you probably know in the HMC GUI it won't allow you to create a "client" adapter in a VIO partition, and it won't allow you to setup a "server" adapter with a VIO partition as a client.
I discovered if you use the HMC command line interface to add the VSCSI adapters, it will actually let you setup a VIO server as a VSCSI client of another VIO server, and even more surprisingly it actually seems to work!
This VIO server has no physical resources and is booting over a virtual scsi disk provided by another VIO server. It also has a virtual optical CD drive served by the other VIO server.
I did run in to issues when trying to virtualize already virtualized resources. For example, it doesn't work to have VIO1(Real resources) serving VSCSI disk to VIO2(all virtual), and the have VIO2 share its already virtual disk resources to an AIX partition (basically disk assigned through VIO1->VIO2->AIX). The VIO2 errpt logged disk/LVM errors and the AIX server had issues with the disk.
This was tested with HMC 126.96.36.199 (an older version..) I'm not sure if it will still work with newer versions of the HMC. Obviously this isn't going to be supported by IBM so only play around with this if you know what you are doing and you are in a TEST environment!
Both the "chsyscfg" (for the profile) and "chhwres" (to DLPAR) worked from this version of the HMC to add a client VSCSI adapter to a VIO server and to add a server VSCSI adapter with a VIO as a client.
If you try it out, post a comment on the blog and let me know what your experience is with it. I just discovered this tonight and haven't done very much testing at this point, but I can already see several scenarios where this could be very useful.
I recently got a question about how to script disabling paths for a Virtual SCSI adapter so you can prepare to take a VIO server down for maintenance.
Of course you can just take down the VIO server and the Virtual SCSI will failover to the other path, however it is generally more graceful and less impactful to use the chpath command to disable the path by lowering it's priority.
Lets suppose we want to take down the "VIO2" server which on the AIX client maps back to vscsi1. If you are not sure which vscsi adapters map to which VIO servers, then check out my previous posting/script: AIX/VIO: Tracing Virtual SCSI / Shared Storage Pool Disks on AIX to VIO resources which will produce a detailed report of how everything related to vscsi maps out.
So in this scenario we want to disable all the paths on "vscsi1" on an AIX client (you would need to do this same procedure on each AIX client of the VIO server). But first, we want to record what the current settings are so we can restore them back after the maintenance.
If we run this one liner, it will show the CURRENT path priority settings for everything on "vscsi1" on our AIX client in the form of command lines to set the priorities. We can save this off and simply run these commands after the VIO maintenance to restore everything like it was originally.
The output will look something like this. Again, note that this command doesn't actually change anything - it just shows you the command lines needed to get back to your original settings.
Now, we can run this command line to set the priority to 255 for all vscsi1 paths which should cause the traffic to switchover to the other vscsi path.
The output will look something like this:
After the VIO maintenance is complete, you can simply run the commands you had previously recorded and it will restore your original VSCSI path priority settings.
I've updated the pslot script to support validating and visualizing Virtual Fibre Channel / NPIV slots in addition to VSCSI slots. For more info and to download the updated script go to the project homepage at: http://pslot.sourceforge.net/
Here is a screenshot of the diagram generated in VFC mode:
Here is a quick HMC one line script to run a command on every VIO server attached to the HMC. It uses "viosvrcmd" which uses RMC so there is no SSH keys or anything to setup; it just works as long as RMC is working in your environment.
In this example, it runs the "errlog" (aka errpt) command on every VIO server. You can change this to whatever command you would like to run. The script looks through all managed systems for VIO partitions, and attempts to use viosvrcmd to run the specified command on each VIO server.
Many people are not aware that IBM included SMIT functionality in the VIO system. The VIO command is named "cfgassist" and it is designed to be used from the padmin restricted shell. When you run "cfgassist" the VIO server is running the AIX command "smitty vios_top" under the covers. So when you run cfgassist you will see a very easy to use menu just like smit in AIX.
If you haven't already tried the VIO "cfgassist" I would highly recommend giving it a try. It can be a big time saver.
Here are a few screenshots of some of the cfgassist funcitonality:
Let me start by saying I am a huge fan of Power Systems, and this posting is just meant as constructive criticism to make a great platform even better. Here are 6 things in my opinion IBM could do to improve AIX and Power Systems.
But first, let me mention a couple of things that in my opinion they have recently done right. #1 - including OpenSSH in the AIX 7.1 base media. And #2 - Including nmon by default in all recent versions of AIX. Thank you for now including both of these, they are both great tools that every AIX machine should have and it makes everyone's job easier to have these tools bundled with AIX and easily updated when AIX is updated. And #3 - Releasing NMON for Linux under the GPL; very cool!
On to the list of things in my opinion IBM could to to improve AIX and Power Systems:
#1 Ditch the HMC restricted shell
The HMC has so much potential to enable administrators to automate tasks and improve systems. However in its current state it is purposely crippled by IBM which severely limits its usefulness. There are so many tasks that could be scripted and automated from the HMC if it wasn't locked down. I am aware that you can setup SSH keys and script items from another machine, but this has several issues. It is extremely slow and inefficient if you write a script that needs to make many queries, it adds complexity by requiring another server and the network to be in the mix, and some things are just not practical/possible to script over a SSH connection (i.e. look at the HMC lpar_netboot and vtmenu scripts). It isn't even possible to add a command alias to the .bashrc for something like this. IBM should unlock the HMC and just tell people that if they install 3rd party software on it that they are on their own. Part of enabling a "Smarter Planet" should be delivering products that allow us to get the most out of the system, not purposely limiting what we can do.
#2 Update and maintain the IBM AIX Toolbox for Linux Applications
The IBM AIX Toolbox for Linux Applications was such a great idea. There are so many great open source applications out there and to have them easily installable on AIX is awesome. However, most of the applications in the Toolbox haven't been updated for many years. So in its current state most of these applications are old and riddled with bugs and known security vulnerabilities. Michael Perzl has done a great job of compiling open source applications for AIX and making them available at http://www.perzl.org/aix/ However in a corporate environment it is much easier to get approval to install a utility that was packaged by IBM and delivered on a DVD with the system. Please IBM - update and maintain the IBM AIX Toolbox for Linux Applications.
#3 Add tab filename completion and arrow key history to the AIX Korn Shell
The title says it all. This
functionality would make the AIX Korn Shell easier to use, and make
new users feel more comfortable. I know you can install bash to get
this functionality, but installing another 3rd party shell isn't
always an option depending on the environment. Other UNIX-Like systems such as OpenBSD use the Korn Shell as their default shell and support tab filename completion and arrow key history.
#4 Drop the VIO command name / syntax differences
VIO servers have different command names and different command syntax for many commands. This causes confusion and makes using VIO frustrating and difficult. For example, the AIX "cfgmgr" command is "cfgdev" in VIO. From what I have heard this was done so that IBM could more easily swap out the underlying OS of VIO from AIX to something like Linux. But whether the command is "cfgmgr" or "cfgdev" you could still swap out the underlying OS to Linux. Please end the insanity and make VIO commands and syntax as similar to AIX as possible.
#5 Support VIO servers as Virtual SCSI clients
Currently the HMC prevents users from setting up VIO servers as Virtual SCSI clients of other VIO servers. This functionality would be extremely useful for providing Virtual Optical access between VIO servers. I recently discovered that you can get this to work by adding the adapters from the HMC command line. It would be great if IBM supported Virtual Optical between VIO servers and supported this from the HMC GUI interface.
#6 Open source NIMOL
NIMOL (NIM on Linux) was a great idea and cool utility. However, it was integrated very closely to older versions of Linux and it is no longer possible to get NIMOL to work on a modern version of Linux such as RHEL 6. NIMOL is just several shell scripts, and it would be very helpful if IBM licensed these scripts under the GPL so that the community could help improve them and modify them to work with newer versions of Linux. This would also enable people to package/bundle NIMOL with a version of Linux so it would be easier to distribute and setup/use.
Agree with me on any of these? Disagree with me? Have other ideas? Post a comment and let me know.
This is a small update to my previous posting on mapping Virtual SCSI disks easily via a script. As input to the script you give it the LPAR name, the label for the mapping, and the LUN serial number, and the script will find the VIO servers, figure out which vhost adapter on each VIO matches up to the LPAR, and find the correct hdisk to map based on the serial number on each VIO server. The script just prints out the command that needs to be run to map each LUN (it doesn't map anything itself), and then you can copy and paste the command yourself to actually map it.
To use the script, you need to have a SSH key to your HMC setup and already have the vhost client/server adapters defined for your LPAR's.
For full details on the script, see my previous posting at: https://www.ibm.com/developerworks/mydeveloperworks/blogs/brian/entry/vio_vscsi_mapping24?lang=en
This small update has a couple of improvements:
The input file format has changed with this update. The input file (in this example mappings.txt) has 3 space separated columns that contain LUN information (1 LUN per line):
Here is an example input file, the first 4 lines use serial only, and the last 3 lines use serial and Z1 (separated by a period). For example, line 5 would be for a LUN with serial number "8024 4215" and a Z1 of "55 43".
The script is called with 3 parameters: The HMC name, the managed system name, and the input filename.
Here is example output from the script:
$ ./vio_vscsi_map.pl testhmc01 lab_520_01 mappings.txt
Here is the updated script:
Thanks to Sebastian Thomas for the ideas to better support serial numbers with spaces and the Z1 field and for him testing the updated script!
I've been working on a script to visual shared processor pool allocations on Power Systems. The script is called "spp2gnu.pl". This is in the very early stages of development and is just a proof of concept and is alpha quality at this point. So if you try it out please let me know what issues you run in to so I can make improvements.
The script connects to the HMC over SSH. You must have SSH keys to your HMC setup for this to work. The script gathers information from the HMC regarding the shared processor pools, and then generates a gnuplot script to create a image like the one below. For thoes of you unfamiliar with gnuplot, it is a powerful multiplatform graphing utility that can be scripted.
Here is a sample image file produced by the spp2gnu.pl script:
The script generates a gnuplot script file, so you need to have gnuplot installed somewhere to produce the graph. I have done all my testing with gnuplot on Linux, although you should be able to install gnuplot on AIX or Windows and get similiar results.
You specify the HMC username@hostname and the managed system name on the command line like in the following example, and then pipe the output to gnuplot:
Or you can run it on a server that doesn't have gnuplot and then transfer the file to another computer that has gnuplot:
Please let me know if you have any suggestions, feedback, or have issues running the script. Again, this is in the very early stages of development and is just a proof of concept and is alpha quality at this point.
Here is the download link: http://www.ixbrian.com/aix_projects/spp2gnu.tgz
I've been doing some work on a script to validate and visualize virtual SCSI slots in a PowerVM environment.
In order to get a single VSCSI server/client adapter pair to work, 8 items must be setup correctly:
It can be easy to make mistakes when setting these up.
I've written a script that validates all VSCSI slots on a server are setup correctly and are paired with a corresponding adapter. The script can run in text mode or optionally visualize the slot layout using Graphviz.
For more details and to download the script, go to: http://pslot.sourceforge.net/