Brian Smith's AIX / UNIX / Linux / Open Source blog
|Modified on by brian_s|
|Modified on by brian_s|
|Modified on by brian_s|
Often times you'll find a command line that works perfectly when you run it locally on a server, but doesn't work when you run it remotely over SSH. Usually the problem is related to double quotes, or backticks in the command. In this post, we will go over problems with double quotes, but the same issue would apply to command lines with backticks in them. In this example, we are running a command locally on an HMC:
If I decide to run this command over SSH (perhaps through a script), it won't work:
What's going on here? Well' the problem is the way the quote marks are processed by the shell running the SSH command. We can see what is happening by changing the "ssh" part of the command to "echo". This will show what the shell is doing to the quote marks:
So what we need to do is tweak our "echo" command line until we get what is echo'd back to the screen to match the originally run command that worked when run locally:
Now that command is echoing back the exact command that works locally, it should also work over SSH:
Another option would have been to use single quotes, however with single quotes you'll have problems if you are trying to use shell variables within the command line, which is very common when scripting something like this. This is why I prefer to use the double quotes and just escape them. Without variables, this command with single quotes will work as well:
I recently received an email from someone who said they needed to change the CPU Pool on hundreds of LPAR's and they asked if I had any suggestions to make the process easier.
There are a couple of ways this could be automated. One option might be to create a script that would generate the commands needed to make the change. But what I would probably do in this instance is just use a spreadsheet and a specially crafted formula to generate the command needed to make the change.
Basically, you create a spreadsheet with 3 columns:
Column A: LPAR Name
Column B: Frame Name
Column C: Desired CPU Pool
Column D: Our special formula to generate the commands
The formula in Column D needs to be something like this for Row 2 of the spreadsheet:
This formula builds the command line by pulling out the LPAR Name, Frame Name, and CPU Pool names out of columns A, B, C in Row 2 (Row 1 being a header).
You then go down to the bottom right of the formula cell, until your mouse turns in to a cross, then click and drag the formula down in to all of the cells below it in column D. Now you have your formula setup in Column D all the way down the spreadsheet, and you just need to fill in columns A, B, and C with your LPAR, Frame, and CPU Pool details. Then simply copy/paste the generated commands in to your HMC to make the changes.
The commands generated by the formulas look like the lines below. Basically it is the command to DLPAR change the CPU pool, followed by a command to force overwrite the current profile with the running configuration so that the CPU pool change gets updated in the profile as well:
IBM recently released a draft Redbook covering the upcoming HMC Version 8 Release 8.8.1.
I've read through the Redbook, and here are the no nonsense highlights I noticed:
POWER5 servers won't be supported in HMC Version 8
POWER6/POWER7/POWER8 servers will be supported. This one caught me by surprise and I am hoping that IBM will change this and end up supporting POWER5 on HMC Version 8 at some point in the future. If you have POWER5 servers still in your environment make sure you let IBM know that you want POWER5 support on HMC Version 8.
Your old HMC's might not be compatible with HMC Version 8
You need to have a Rack Mounted CR5 or later HMC or a Desktop C08 or later HMC with at least 2 GB of memory to run HMC Version 8. So this means people with 7042-CR4 HMC's or older will not be able to upgrade to HMC Version 8.
Running HMC Version 8 as a Virtual Machine still not supported
Totally absent from the Redbook draft is any mention of running HMC Version 8 as a Virtual Machine (under VMware for example). This is disappointing because with the short lived SDMC IBM supported running it in a virtual environment. Hopefully this will change and IBM will one day support running the HMC as a virtual machine.
New Performance and Capacity Monitor
A very cool new feature in HMC Version 8 is a integrated performance and capacity monitor. This will graph information about CPU usage, memory usage, network throughput, and storage throughput. It will support POWER6 and later servers. In previous HMC versions we had to use 3rd party software like LPAR2RRD for this kind of functionality. This is a very cool feature and I'm looking forward to trying it out.
Further SR-IOV Support
HMC Version 8 will add further support for virtualizing adapters with SR-IOV. This is similar in concept to the old IVE (Integrated Virtualized Ethernet) adapters in that it lets you take a single physical port and assign logical ports to multiple LPAR's. SR-IOV works independent from the VIO server and doesn't require a VIO server at all. You can create up to 48 logical ports per physical adapter. It is very fast, high performing and also supports QoS (quality of service). However the big drawback to SR-IOV is that it doesn't support Live Partition Mobility (LPM), suspend/resume, or remote restart. One possible way to get around this limitation is to assign a SR-IOV logical port to a VIO and create a SEA adapter out of it. But I'm not sure of a practical scenario in which someone would do that.
You might have a multiple step upgrade process to get to HMC 8
You can only upgrade to HMC Version 8 from 7.7.8 (with MH01402) or from 7.7.9. So if you are running a version older than this you'll need to do a multi-step upgrade and upgrade to one of these levels first, and then to HMC Version 8. Not a big deal, but something people need to be aware of so that they can get all the correct media needed for the upgrade and allocate enough time to do a multi-step upgrade.
Partition Remote Restart enhancement
Remote Restart is a very cool feature that allows LPAR's to automatically come back up on another frame in the event of an outage on the frame they were originally running on. This is super handy since you can't use LPM if the source server is down. Previously you could only enable Remote Restart on a LPAR at the time the LPAR was created. With HMC Version 8 this limitation has been removed and it can now be enabled without having to re-create the LPAR. Awesome!
Other Miscellaneous Improvements
Here are some other improvements
Post a comment if I missed any other big new features in HMC Version 8.
This post is about a script I wrote for building filesystems on AIX. It automates the process of creating logical volumes, filesystems, mounting them, setting user/group owners, and setting permissions. It can be used to create large numbers of filesystems quickly, and it is also handy if you need to create the same filesystems across multiple different servers.
Start by creating a CSV file based on this example/template (the first line is the header line). Simply copy and paste this in to a new file and name it with a .csv extension:
Open up this CSV file in your favorite spreadsheet application (I'm using LibreOffice in this example, but Excel should work as well). Once in the spreadsheet make changes to your CSV file specify what filesystems you want to create:
The columns are pretty self explanatory. The "Mount Options" is optional (and if you specify multiple mount options separate them with a period, i.e. rbrw.cio.dio) The "Log" is also optional (if you don't specify it will default to an existing log in the volume group).
Once you are done editing the file in the spreadsheet save it in CSV format. It MUST be CSV to work. To make sure, transfer the file to your AIX server and "cat" the file, and you should see something similar to this:
Now run the script and specify the CSV as a parameter. By default, the script doesn't make any changes or actually do anything at all other than show the commands that need to be run to create the filesystems:
Review the output to make sure everything looks good. If you want to actually run the commands generated, you can either redirect the output to a file and run that file as a script, or you can just run the scriptfs script and pipe it to "ksh" which will cause it to run the commands and actually create the filesystems:
When this ran, it created the logical volume, filesystem, mounts them, changes the user/group, and sets the permissions.
Here is the script:
I was asked about how to connect to a Power 5 server through a serial cable and made a quick video that shows what you need, and how to connect to the server and access the text based ASMI as well as boot in to SMS and then boot from a CD all through a serial cable using PuTTY.
This isn't the best quality video but I thought I would put it out here in case it can help someone else who is having problems accessing their server through a serial cable..
There are several storage related settings in AIX that cannot be changed if the device is active. These include "fast_fail" , Dynamic tracking (dyntrk), and the "num_cmd_elems" for HBA's and the Queue Depth for hdisks.
Your options to set these are either make the device inactive (usually by taking redundant paths offline) and then make the change, or to use the "-P" flag on chdev and then reboot the server to make the change effective at the next boot.
The "-P" option on chdev has one major drawback however. As soon as you make the change with chdev "-P" it appears that the setting is active right away even before the reboot. If you check with "lsattr" it will appear as if the setting has taken effect. However it actually won't take effect until the next reboot. What has essentially taken place is that the running configuration is out of sync with the ODM. The ODM reflects the updated settings, however they can't be changed in the running configuration of the AIX kernel until the next reboot.
Until recently, the only way to really verify if these kinds of attributes were actually active was to check in KDB. Last year I even wrote a script that would check this in KDB and report differences (see post: Script to show if your AIX HBA / hdisk settings are actually in effect )
Very recent versions of AIX have made a major improvement to "lsattr" where you now have a "-P" flag to show what is actually currently active. Chris Gibson did a very good write up on this on his blog (see his post: Thanks kdb but lsattr's got me covered! )
I wrote a script that will go through every device on your AIX server and compare the "lsattr -Pl" (running config) versus "lsattr -El" (ODM config) and show you all devices that have differences. If the script finds any differences it will show you which attributes are different between the ODM and the running configuration. If everything is in sync then there isn't any output.
Since the script relies on "lsattr -Pl" you must be at AIX 6.1 TL9 or AIX 7.1 TL3 or later for this script to work! If you are running an older AIX version check out my previous script that uses KDB (Script to show if your AIX HBA / hdisk settings are actually in effect)
Here is some example output from the script that shows fcs0's num_cmd_elems and hdisk0's queue_depth attributes are changed in the ODM but not on the running/active system. The ODM configuration for num_cmd_elems is 199 but the running configuration is 200. Likewise, the queue depth ODM configuration is 19 but the running configuration is 20.
Here is the script (again, note that it needs AIX 6.1 TL9 or AIX 7.1 TL3 or later to work):
I recently got an email from Dan Aldridge with some information about a very handy AIX command, "chdef". I wasn't familiar with this command before, and it is super-useful, so I thought I would write a quick post about it..
The HMC version 18.104.22.168 introduced a new "Current Config Sync" feature that is very, very useful.
Anyone who has been working with Power Systems for long has been bitten by the fact that LPAR's saved "Profiles" and their "Running Configs" can get out of sync. For example, if you DLPAR some new Virtual Adapters to your VIO server, and forget to add them to the profile as well, you might have serious issues the next time you shut down and reboot your VIO server. This is because anything in the "Running Config" is lost when the LPAR is shutdown if it wasn't also added to the persistent profile. For several years, anytime you did a DLPAR change (such as adding/removing Memory/CPU's/Virtual CPU's/Virtual Adapters/etc.) you had to also make the same change in the profile or else you would loose the change the next shutdown/re-activation. This was such a problem that I created a extensive script that reports differences between the running config and the profile (see prdiff website for more info: http://prdiff.sourceforge.net/ )
A few years ago IBM introduced a very cool HMC feature that allowed you to "Save the Current Configuration". This allowed you to take the "Running Config" and save it to the profile overwriting whatever was there. This allowed you to make your DLPAR changes, then just "Save Current Configuration" to write the changes you made to the profile. This was a good step in the right direction, but still left room for error. If you forgot to "Save Current Configuration" after making a DLPAR change you would still have problems the next time you shutdown/re-activated the LPAR.
There is now a new "Current Config Sync" feature in HMC 22.214.171.124. This new feature is awesome and what everyone has been asking IBM for since DLPAR was first introduced..
If enabled, the "Current Config Sync" feature will automatically "Save the Current Configuration" anytime you make a DLPAR change. You don't have to do anything extra.. You just make your DLPAR change and it will automatically be Sync'd with the profile. Awesome!
The first thing you need to do is enable the "Current Config Sync" option. This feature is disabled by default (more on this later in the post). Start by clicking on the LPAR's name in the HMC window to bring up the Partition Properties. Right at the bottom of the general tab you will have the following "Sync current config" options:
By default, it is set to "Sync turned OFF". Change this to "Sync turned ON". Once you do this and click OK, the HMC will go ahead and "Save the Current Configuration" of the LPAR to the profile.
Next, lets make a DLPAR addition of a Virtual SCSI adapter:
OK, so we've DLPAR'd in Virtual SCSI adapter #9 with a Client Adapter #9 as well. We have "Sync Turned ON", so when we check the profile we can see that this new VSCSI adapter was automatically added to the profile as well, without any further effort on our part:
The Third Option - Sync Suspended until next Activation
You can either have Sync turned "On", "Off", or "Suspended until next Activation". You might be wondering when you would choose this third option to suspend the sync? This option is useful if you need to make changes to the LPAR that are not possible to make through DLPAR. For example, if you needed to change the "Max CPU" setting, you can't change it through DLPAR. This is where the "Sync suspended until next activation" comes in handy!
Lets suppose we want to change the "Max CPU" setting for the "aix1" LPAR which currently has the Sync feature turned "On". We can't DLPAR this change, so we pull up the partition profile and change the Max CPU to 0.8:
When you click "OK" you will get this message:
This message is letting you know that by changing the LPAR's Profile like this you are going to cause the profile and running configuration to be out of sync, and that this will cause the Sync setting to be changed to "Sync suspended until next activation".
Here is a screenshot that shows they are now out of Sync:
So from this point the "Sync" is disabled since you have made a change that caused them to be out of sync. The next time you shutdown and re-activate the LPAR to activate your new Max CPU setting the "Sync" setting will automatically get changed back to "ON" and everything will be back in sync again:
One-Liner to Mass-Enable this Feature
The only downside to this feature is that it is disabled by default.. Using the GUI to enable it requires clicking on each LPAR one at a time and setting the Sync option to "On". If you have a bunch of LPAR's this would take forever!
Here is a one liner command to enable the Sync option on ALL LPAR's on the HMC:
Here is a command to check the status of the Sync option for all LPAR's:
It will show a "0" for "OFF", "1" for "ON", and "2" for "Sync suspending until next activation".
All I can say is "Well done IBM!" This is one of the most useful features added to the HMC in a very long time! If you haven't already, upgrade to HMC 126.96.36.199 and give this new feature a try.
I highly recommend only using scalable volume groups if at all possible.. But there are a lot of "Original" and "Big" AIX volume groups in existence out there, and you need to understand the relationship between Factor Size, Volume Group Type, and PP Sizes in order to support these volume groups.
Here is a table that shows for both Original and Big volume groups how changing the "Factor Size" changes the ratio between how many disks (PV's) the volume group can contain and how big they are. Basically you can either have lots of smaller disks, or fewer larger disks depending on how you set the Factor Size. The PP Size comes in to play because the larger the PP size the larger a disk the volume group will be able to support.
Original volume groups support a Factor Size between 1 to 16, and Big Volume groups support a Factor size between 1 to 64 (Scalable volume groups don't support/need a Factor Size)
For a script to show volume group details such as your current Factor Size and volume group type, see my previous post Deciphering AIX Volume Group limitations and types
Here are the tables that show all this in detail:
Original Volume Groups
Big Volume Groups
Mainly for my future reference, here is the script that generated these tables (written/ran on Linux with Bash, probably won't run on AIX without installing some GNU tools):
The man page for the AIX mkvg command states this about how Physical Partition sizes are determined for volume groups when they are created:
The default value for PP size is kind of hard to understand based on the man page description. Below are a couple of tables that show given a hdisk size what the default PP size would be if a volume group was created on it without specifying a PP size.
AIX basically defaults to the smallest PP size possible for the given disk, which is almost certainly not what you want. Especially with original and big volume groups a small PP size puts severe limitations on the total capacity of the PV and volume group.
I recommend using scalable volume groups, and specifying a PP size rather than just taking the default value. Think about what the requirements will be in the future, and pick a PP size for the volume group based on that. Remember that the PP size determines the increments that you will be able to add space to filesystems to, so don't pick too large of a PP size. For example, if you pick a 1 GB (1024 MB) PP size, and have a 4 GB filesystem and you need to add 0.5 GB to, you will be forced to add a full 1 GB since that is the PP size and you would have to add a entire PP. If the PP size had been 512 MB you would have been able to add just 0.5 GB rather than being forced to add a full GB.
To allow the volume group to have the maximum overall capacity in the future for growth, pick the largest PP size that would work for your current and future requirements. 128 MB, 256 MB, and 512 MB PP sizes are good middle ground sizes that will work in most circumstances (again, pick the largest that will work for your environment). Remember that you can not change a volume groups PP size after it is created without backing the data up and recreating it.
Here are the default PP sizes that AIX will default to if you don't specify the PP size yourself:
If you work with AIX systems that have original or big volume groups you are going to have limitations on the size of LUN's that can be supported by the volume group.
To find the maximum disk size allowed in the volume group, run "lsvg <vgname>". Find the "PP Size" and multiply it by the "MAX PPs Per PV". For example, if your PP size is 16 MB and your Max PPs Per PV is 2032 the maximum PV/disk size would be 32,512 MB (about 32 GB).
But what happens in this scenario if you take a 20 GB LUN and increase its size on the SAN to 300 GB? (way past the 32 GB limit of this particular volume group). AIX will not allow the LUN to increase in size when you run the "chvg -g" command, and the extra space will not be available to you. The extra space is allocated on the SAN but not usable on AIX so the space is essentially wasted.
Now, to fix this you have two options. Depending on the situation you might be able to change the volume groups "Factor Size" (see my previous post on Deciphering AIX Volume Group limitations and types) or you might have to convert to either a Big (usually no downtime) or Scalable volume group (requires volume group to be varried off). To convert to either Big or Scalable you must have free PP's on every PV/disk (see my previous post on a Script to free up PP's across all hdisks in an AIX volume group to make this process MUCH easier).
How can you tell if you have LUN's that are bigger than what the volume group supports? One of the easiest ways is to compare the output of what "lspv <hdisk>" reports for the total size of the disk against what "bootinfo -s" reports as the size of the disk. Normally there is a small discrepancy between these sizes because of the overhead that the volume group takes, but if there is a big discrepancy it means either the disk is to large for the volume group to support it, or you haven't run "chvg -g" yet for the volume group to recognize the increase size of the hdisk.
Here is a one liner script that will check all of the hdisks on a server. The first column of the output is the size of the disk that "lspv" reports. The second column is the size that "bootinfo -s" reports. The third column is the difference between these two numbers.
As you can see in this example, someone tried to resize hdisk5 to 300 GB, however this particular volume group only supports a max PV/hdisk size of 32 GB so the extra space cannot be utilized unless the volume group factor size is changed (which may or may not be possible) or unless the volume group is converted to a big or scalable volume group.
Often times System Administrators need to copy a file to remote servers with root authority. One example might be to push out an updated resolv.conf file to all servers. However, often times remote root logins are disabled so it can be difficult to copy files with root authority if root logins are disabled.
Here is a quick and easy method to copy files using root authority even if remote root logins are disabled. For this method to work, you will need the following setp:
Once this is done, you can use a combination of "cat", "sudo", and "ssh" to easily push out a file to mulitple servers with root authority..
Here is an example of pushing out a new resolv.conf file to all servers listed in the "serverlist" file:
It works by catting the file to be transferred and pipping it to the SSH connection. Within the SSH connection we sudo to root and then use cat to write the standard input to the file.
This will also work with binary files - not just text files..