Brian Smith's AIX / UNIX / Linux / Open Source blog
|Modified on by brian_s|
Often times you'll find a command line that works perfectly when you run it locally on a server, but doesn't work when you run it remotely over SSH. Usually the problem is related to double quotes, or backticks in the command. In this post, we will go over problems with double quotes, but the same issue would apply to command lines with backticks in them. In this example, we are running a command locally on an HMC:
If I decide to run this command over SSH (perhaps through a script), it won't work:
What's going on here? Well' the problem is the way the quote marks are processed by the shell running the SSH command. We can see what is happening by changing the "ssh" part of the command to "echo". This will show what the shell is doing to the quote marks:
So what we need to do is tweak our "echo" command line until we get what is echo'd back to the screen to match the originally run command that worked when run locally:
Now that command is echoing back the exact command that works locally, it should also work over SSH:
Another option would have been to use single quotes, however with single quotes you'll have problems if you are trying to use shell variables within the command line, which is very common when scripting something like this. This is why I prefer to use the double quotes and just escape them. Without variables, this command with single quotes will work as well:
This post is about a script I wrote for building filesystems on AIX. It automates the process of creating logical volumes, filesystems, mounting them, setting user/group owners, and setting permissions. It can be used to create large numbers of filesystems quickly, and it is also handy if you need to create the same filesystems across multiple different servers.
Start by creating a CSV file based on this example/template (the first line is the header line). Simply copy and paste this in to a new file and name it with a .csv extension:
Open up this CSV file in your favorite spreadsheet application (I'm using LibreOffice in this example, but Excel should work as well). Once in the spreadsheet make changes to your CSV file specify what filesystems you want to create:
The columns are pretty self explanatory. The "Mount Options" is optional (and if you specify multiple mount options separate them with a period, i.e. rbrw.cio.dio) The "Log" is also optional (if you don't specify it will default to an existing log in the volume group).
Once you are done editing the file in the spreadsheet save it in CSV format. It MUST be CSV to work. To make sure, transfer the file to your AIX server and "cat" the file, and you should see something similar to this:
Now run the script and specify the CSV as a parameter. By default, the script doesn't make any changes or actually do anything at all other than show the commands that need to be run to create the filesystems:
Review the output to make sure everything looks good. If you want to actually run the commands generated, you can either redirect the output to a file and run that file as a script, or you can just run the scriptfs script and pipe it to "ksh" which will cause it to run the commands and actually create the filesystems:
When this ran, it created the logical volume, filesystem, mounts them, changes the user/group, and sets the permissions.
Here is the script:
I highly recommend only using scalable volume groups if at all possible.. But there are a lot of "Original" and "Big" AIX volume groups in existence out there, and you need to understand the relationship between Factor Size, Volume Group Type, and PP Sizes in order to support these volume groups.
Here is a table that shows for both Original and Big volume groups how changing the "Factor Size" changes the ratio between how many disks (PV's) the volume group can contain and how big they are. Basically you can either have lots of smaller disks, or fewer larger disks depending on how you set the Factor Size. The PP Size comes in to play because the larger the PP size the larger a disk the volume group will be able to support.
Original volume groups support a Factor Size between 1 to 16, and Big Volume groups support a Factor size between 1 to 64 (Scalable volume groups don't support/need a Factor Size)
For a script to show volume group details such as your current Factor Size and volume group type, see my previous post Deciphering AIX Volume Group limitations and types
Here are the tables that show all this in detail:
Original Volume Groups
Big Volume Groups
Mainly for my future reference, here is the script that generated these tables (written/ran on Linux with Bash, probably won't run on AIX without installing some GNU tools):
If you work with AIX systems that have original or big volume groups you are going to have limitations on the size of LUN's that can be supported by the volume group.
To find the maximum disk size allowed in the volume group, run "lsvg <vgname>". Find the "PP Size" and multiply it by the "MAX PPs Per PV". For example, if your PP size is 16 MB and your Max PPs Per PV is 2032 the maximum PV/disk size would be 32,512 MB (about 32 GB).
But what happens in this scenario if you take a 20 GB LUN and increase its size on the SAN to 300 GB? (way past the 32 GB limit of this particular volume group). AIX will not allow the LUN to increase in size when you run the "chvg -g" command, and the extra space will not be available to you. The extra space is allocated on the SAN but not usable on AIX so the space is essentially wasted.
Now, to fix this you have two options. Depending on the situation you might be able to change the volume groups "Factor Size" (see my previous post on Deciphering AIX Volume Group limitations and types) or you might have to convert to either a Big (usually no downtime) or Scalable volume group (requires volume group to be varried off). To convert to either Big or Scalable you must have free PP's on every PV/disk (see my previous post on a Script to free up PP's across all hdisks in an AIX volume group to make this process MUCH easier).
How can you tell if you have LUN's that are bigger than what the volume group supports? One of the easiest ways is to compare the output of what "lspv <hdisk>" reports for the total size of the disk against what "bootinfo -s" reports as the size of the disk. Normally there is a small discrepancy between these sizes because of the overhead that the volume group takes, but if there is a big discrepancy it means either the disk is to large for the volume group to support it, or you haven't run "chvg -g" yet for the volume group to recognize the increase size of the hdisk.
Here is a one liner script that will check all of the hdisks on a server. The first column of the output is the size of the disk that "lspv" reports. The second column is the size that "bootinfo -s" reports. The third column is the difference between these two numbers.
As you can see in this example, someone tried to resize hdisk5 to 300 GB, however this particular volume group only supports a max PV/hdisk size of 32 GB so the extra space cannot be utilized unless the volume group factor size is changed (which may or may not be possible) or unless the volume group is converted to a big or scalable volume group.
Here is a quick one liner script you can copy/paste in to your HMC SSH terminal that will generate a HTML report showing all of the managed systems and LPAR's attached to the HMC and their current state.
You could easily change/extend the script to show other information like LPAR CPU settings, memory settings, etc. in the report as well.
This kind of script can be very handy to generate quick reports to send to people who might not have direct access to the HMC. You can also run the one liner on multiple HMC's, and combine the output in to a single HTML file to see information from all your HMC's in a single HTML report.
Here is the one liner script to run on the HMC to generate the report:
Here is sample output of what the generated HTML report looks like:
Here is a quick HMC one line script to run a command on every VIO server attached to the HMC. It uses "viosvrcmd" which uses RMC so there is no SSH keys or anything to setup; it just works as long as RMC is working in your environment.
In this example, it runs the "errlog" (aka errpt) command on every VIO server. You can change this to whatever command you would like to run. The script looks through all managed systems for VIO partitions, and attempts to use viosvrcmd to run the specified command on each VIO server.
AIX stores the last time a user changed their password as a "epoch" time stamp, or in other words as the number of seconds since 1970.
For example, if you want to see when the last time root changed their password you can type:
This shows that root changed their password at 1,391,663,150 seconds after 1970. In other words, this really isn't very helpful unless you take this epoch number and convert it to a real date.
Here is a one liner function that will give you a "lastpwchg" command that will show a normal date/time for a users last password change. Just type "lastpwchg" followed by the username you would like to check:
The results look like this when you run it to check a user:
If you found this useful, you might also want to check out this post: Don't let your AIX passwords expire
Version 1.2 of EZH (Easy HMC Command Line Interface) has been released.
EZH is a script for the IBM HMC console to provide an alternate, easier to use, command line interface for many common commands and the goal of the project is to make the HMC command line interface easier to use for day to day administration tasks. It also includes an optional interactive menu to make it even easier to use.
EZH is Open Source and 100% free, and is very easy to install. For more information and to download, see the project page at http://ezh.sourceforge.net/
Last month an article I wrote on Tracing IBM AIX hdisks back to IBM System Storage SAN Volume Controller (SVC) volumes was published on IBM developerWorks.
The article included a script designed to automate the process of tracing back AIX hdisks back to SVC Volumes so that you could easily see all the SVC related information about the AIX hdisk, including the SVC volume name.
Dan Aldridge modified the script in the article with a few very nice improvements. It no longer depends on pcmpath, supports more versions of the SVC, and even works with SVC Volumes presented through Virtual SCSI (* Requires AIX 6.1 TL7 or later). For more information on what the script does and how to use it, see the original article linked above.
Here is the modified script from Dan:
Check out my latest article on Power IT Pro, "Boost Your Productivity with Single-Line AIX Shell Scripts": http://poweritpro.com/aix/boost-your-productivity-single-line-aix-shell-scripts
Being careful not to shoot yourself in the foot when cleaning up users and home directories on AIX and Linux
It is possible on Linux or AIX to have users that share a home directory. For example, you might have user1, user2, and user3 all have their home directories set to /sharedhome.
Anytime you are deleting users and home directories you need to keep this in mind. You might only need to delete "user1" but want to leave "user2" and "user3" unaffected. But if you delete user1 and its home directory you might end up deleting the shared home directory which would have a big impact on "user2" and "user3".
Let's look at this situation:
AIX and Red Hat both have a "userdel" command that will optionally erase the users home directory as well ("-r" flag).
On most versions of AIX, if you did a "userdel -r" on any one of the 3 users, it would deleted the /sharedhome directory which would impact the remaining 2 users that you didn't want to affect.
On Red Hat Linux, "userdel" tries to be smarter, and verifies that the owner of the home directory matches the user that is being deleted. Thus, if you did a "userdel -r user1" it would happily wipe out /sharedhome, but if you did a "userdel -r user2" or "userdel -r user3" it wouldn't delete /sharedhome because the owner of the directory doesn't match the user being deleted.
Here is a one-liner that will do a better job checking to see if a given directory is a shared home directory between multiple users:
Change "/sharedhome" to whatever directory you would like to check. If this comes back and says it is a shared home directory, then you need to do more research before attempting to delete the home directory.
So anytime you are cleaning up users and home directories, always keep in mind that users might have shared home directories and that under some circumstances AIX and Linux will wipe out these shared home directories which might affect other users on the system.
Over the years I've noticed that a lot of the core utilities on AIX are actually shell scripts.
Here are some examples of these utilities on AIX that are either shell scripts (ksh/csh) or in some cases Perl scripts:
As you can see, there are some pretty important commands in this list. And this is just a small sample of them. On my AIX server I found that there are over 400 scripts included as part of base AIX! You can see a full list of all the scripts that make up your system by running a command like this:
It is pretty cool that so many of the core commands/utilities on AIX are made up of shell scripts. For one, it shows that shell scripts can take on very important and critical tasks. It can also be extremely helpful to be able to review the scripts if you are having any issues with any of these commands. And these scripts can be an excellent learning tool. These are extremely well written and robust scripts many of which have been used for decades on thousands and thousands of servers.
This is an update to my previous post on a Script to show recent Error Report (errpt) entries on AIX
Anthony and Dan had some good suggestions such as being able to specify the interval to go back in days instead of just minutes, and also having an option to just have the script show error report entries since the last time the script was run.
So below is version two of this script.
The changes are:
Here is the updated script:
Update 10/24/13: See also Version 2 of script to show recent Error Report entries on AIX
Here is a script that will show you recent Error Report (errpt) entries on AIX. As an argument to the script you specify the number of minutes you want to go back, and the script will only show errpt entries that have occurred within that many minutes from now.
This can be helpful as a standalone utility, or as part of a monitoring script that would automatically notify you if a new errpt entry came up within the last few minutes.
For example, to only show error's that have occurred within the last 15 minutes:
Or the last hour:
Or the last day:
Here is a screenshot:
Every Filesystem in AIX has two sets of permissions: The permissions on the mount point directory, and the permissions on the mounted filesystem.
Here is an example:
Normally the Mount Point Permissions don't come in to play once the filesystem is mounted (however here is an post that shows what I recommend for them)
However, if a user doesn't have read/execute permissions on the mount point you will see weird behavior and frequently have application issues as well.
Here is an example showing this:
As a non-root user, we do an "ls -al" in the directory and get a weird "./..: Permission denied" error. This is caused because the underlying mount point permissions are restricted (700) and the user doesn't have read/execute permissions on the underlying mount point (even though the mounted filesystem has 777 permissions).
Now there are 2 different ways to check to see what the permissions are on the underlying mount points of your filesystems. You can unmount the filesystem, and do an "ls -ald" on the mount point (but it will probably require application downtime to unmount the filesystem...) Or you could use this handy script that will show you the underlying mount point permissions while the filesystem is online and mounted.
Just a quick disclaimer however... These scripts have worked with my limited testing; but use them at your own risk. The IBM documentation always recommends unmounting the filesystem to check or change mount point permissions and this is the safest and best way to do it. These scripts will do everything with the filesystem mounted and online.
When run, it will show you the underlying mount point permissions for all mounted JFS/JFS2 filesystems:
If you have filesystems with too restrictive mount points that are causing you issues (like /test1 and /app2 in the example above), then you can either unmount the filesystem and change the mount point permissions, or use this script to add read/execute permissions to user/group/others on the underlying mount point directory while it is still mounted and online:
With the script you specify a filesystem and it will add the read/execute permissions on the underlying mount point:
To determine the oslevel on AIX, you can run the "oslevel -s" command. However, what "oslevel -s" reports doesn't always show the entire picture. The OS level reported will be the lowest level of any installed AIX fileset on your server.
For example, if all the filesets on your AIX server are upgraded to AIX TL8 SP3 except for one fileset which is at a lower level, then the oslevel reported will reflect the lower level of that single fileset, which might be something like TL4 SP2. So even though your server is 99.9% AIX TL8 SP3 oslevel would report the lowest level of any installed fileset.
The "oslevel -sq" command will show all of service packs that your AIX server is aware of. If you compare the top line in "oslevel -sq" versus "oslevel -s" they should normally match. If they don't, then you probably have an issue.
If you have a downlevel OS you can figure out which filesets are causing the issue and then fix them.
The first step to figuring out what filesets are causing the problem is to determine if your TL (Technology level) level is incorrect or just your SP (Service Pack) level is incorrect. To do this, compare the highest "oslevel -sq" line with your current "oslevel -s".
If the first 7 characters (####-##) match, but the rest are different, then your TL level is correct, but your SP level is not. For example, if your top line in your "oslevel -sq" output was 6100-07-02-1150, and your "oslevel -s" output was "6100-07-01-1141" then you would know your TL level was correct at TL7, but your SP level was not (oslevel -sq reported SP2, oslevel -s reported SP1). To determine which filesets are the problem if the TL level is correct, but the SP level is wrong, run:
This command will show you all the filesets that are below the SP level of the highest known SP level on the system.
If the TL level doesn''t match, for example if your top line in your "oslevel -sq" output was 6100-07-02-1150, and your "oslevel -s" output was "6100-04-11-1140" then you would know your TL level is incorrect (oslevel -sq reported TL7, oslevel -s reported TL4). To determine which filesets are the problem if the TL level is not correct, run:
This command will show you all the filesets that are below the TL level of the highest know TL level on the system.
Here is a script that will automates this process (note that this script doesn't work with AIX 5.2 or older). It will check out the state of your system and let you know if you have downlevel filesets:
Here is a screenshot of the output:
You might be wondering how you can avoid getting in a downlevel OS situation in the first place... Well usually this issue happens if you use the base media from an older level to install a fileset. For example, if you are at 6.1 TL8 SP3 and a user requests you install a new fileset. You only have the 6.1 TL7 SP2 base media, so you use it to install the requested fileset. If you just do this, your OS level will more than likely be downlevel now and report an incorrect version. What you need to do after installing a fileset from older media is to reinstall the TL8 SP3 update filesets to bring what you just installed up to the correct level. Remember - ALWAYS check the oslevel before and after you do any work related to filesets to make sure what you just did didn't downlevel the OS.
I have always really liked the AIX sddpcm mulitpathing software. It is easy to use, and easy to gather information from. However, one thing I have wanted to do in the past is run a command that just shows the hdisk device and the corresponding SAN serial number.
The closest thing SDDPCM has to this is the "pcmpath query device" command which shows a bunch of information for each SAN LUN:
If you want to filter this down and only show the hdisk device and serial number it can be a little tricky. Normally something like this would be as simple as a grep and an awk to print the fields you want, however in this case the information we want to pull out is on 2 different lines.
We can do a "egrep" and get all the lines that start with "DEV" or "SERIAL":
This is closer to what we want, but still has a lot of extra information and doesn't have the hdisk name and serial number on the same line.
The trick to fix this is the "paste" command. If we add another pipe on the end of the command, and pipe to "paste - -" it will merge every other line together:
This is very close to what I originally wanted. Now all we need to do is add a awk command at the end to only print the fields we want:
By far my favorite shell is the Bash shell. It has all kinds of awesome features including tab filename completion and tab command completion. Unfortunately Bash is not included with AIX by default, so a lot of the time we have to make the best of the Korn shell.
With AIX's Korn shell you can do a "set -o vi" and then hit "ESC \" to get filename completion. But this doesn't work for command name completion. For example, if you type "hostna" and then hit "ESC \" it won't autocomplete to "hostname".
I really like command name completion with bash. It makes it really easy to find command names that you can't quite seem to remember. For example if you were trying to remember the names of the commands to vary on and off volume groups you could just type "vary" and hit tab and see all the commands that start with the word "vary".
Here is a little alias for the Korn shell that will make it easier to find command names:
Basically this alias will look through every file in each directory in your $PATH. If the file is executable, it will be displayed. Thus, if you type "lscmd" you will see a list of all executable commands.
You can also run something like "lscmd | grep vary" to see commands that contain the word "vary" in them:
Or you could look for commands that contain "lv" in the name:
Here is a short script to show a "tree" view of Etherchannel devices on AIX. It shows the devices that make up the etherchannel, including the backup device if there is one:
Here is the script:
If you like this script, you might also like these posts: Show tree view of AIX device classes and subclasses and also Show AIX device dependency tree