If you haven't seen it yet, check out my latest article on IBM developerWorks: Getting more done with less: Using scripts and utilities to quickly run commands across all your IBM AIX servers
Brian Smith's AIX / UNIX / Linux / Open Source blog
Getting more done with less: Using scripts and utilities to quickly run commands across all your IBM AIX servers
As an AIX administrator you've probably run into problems such as these:
There are several AIX utilities that can help in these kind of scenarios: truss, trace, and kdb.
At first glance, these utilites might seem overwhelming and too complicated to use. But if you try them out you'll quickly realize you don't need to understand everything in the output to still pick up useful information that will help you narrow down issues.
Truss is my favorite because it is the easiest to use. Truss shows all of the system calls a process is making. For example, if a process needs to open a file, it creates a system call that is passed up to the kernel. By looking at a process with truss you can get an idea of what the process is doing and why it might be failing. Useful flags are "-f" to also show info on child process that are created, and the "-p" flag which lets you specify a process ID of a running process to trace.
One time I had a database client on a server stop working. It gave a very non-descriptive error about why it was failing. I reseached the error online and couldn't find any info. I checked with the DBA and he insisted they hadn't changed anything. So I ran the failing database client command with truss and it showed that it was trying to access a file in a non-obvious place and wasn't able to access the file due to file permissions. It turned out the DBA's had installed an update that had changed some file permissions :) With the help of truss we were able to determine what the process was doing and why it failed.
To learn more about truss, look through the man page, and then try running truss on an "ls" command for a file/directory your user doesn't have access to. Look through the output and you should find something like this:
kopen("/testdir", O_RDONLY) Err#13 EACCES
This shows that "ls" tried to open /testdir but got an "Err#13 EACCESS" which means the user didn't have access to this directory.
The trace utility records systems events. You run it for a set amount of time (starting with the "trace" command, followed by a "trcstop" command), and then use the "trcrpt" command to create a report. You can use the "-p" flag on trcrpt to specify a PID that trcrpt should report on. The trace utility generates a huge amount of output in a short time, so you probably want to only run the trace for a few seconds. This article over at developerWorks has good example command lines to use trace: http://www.ibm.com/developerworks/aix/library/au-aix-jfs2-inode/
I once had a process that could not be killed. I tried "kill -9" several times with no success. If a process is hung in kernel mode it will not respond to signals like kill. I was able to run a trace on the system and find that the process was repeatedly trying to open a log file from another process. So I stopped and started the other process and then the original process that I couldn't kill died.
The kdb command starts the kernel debugger. You want to be very careful within kdb as you can potentially crash the system if you do the wrong thing in here. I am not very familiar with kdb as there isn't very much information publicly available. The same developerWorks article previously mentioned has an example of using kdb to debug an issue: http://www.ibm.com/developerworks/aix/library/au-aix-jfs2-inode/
Other examples of using kdb:
Give these utilities a try and get familiar with what the output looks from them. Then, the next time you have a problem you can't figure out, give them a try and see if the output points you in the right direction. Again, don't get overwhelmed by the output or intimidated. You don't need to understand all the output - just try to look for patterns or things that look unusual.
By far my favorite shell is the Bash shell. It has all kinds of awesome features including tab filename completion and tab command completion. Unfortunately Bash is not included with AIX by default, so a lot of the time we have to make the best of the Korn shell.
With AIX's Korn shell you can do a "set -o vi" and then hit "ESC \" to get filename completion. But this doesn't work for command name completion. For example, if you type "hostna" and then hit "ESC \" it won't autocomplete to "hostname".
I really like command name completion with bash. It makes it really easy to find command names that you can't quite seem to remember. For example if you were trying to remember the names of the commands to vary on and off volume groups you could just type "vary" and hit tab and see all the commands that start with the word "vary".
Here is a little alias for the Korn shell that will make it easier to find command names:
Basically this alias will look through every file in each directory in your $PATH. If the file is executable, it will be displayed. Thus, if you type "lscmd" you will see a list of all executable commands.
You can also run something like "lscmd | grep vary" to see commands that contain the word "vary" in them:
Or you could look for commands that contain "lv" in the name:
Many people are not aware that you can specify multiple file names on the command line for most commands that manipulate files.
For example, if there were serveral files I wanted to change permissions on I can list all the file names on a single "chmod" command line:
$ chmod 400 /tmp/file1 /home/file5 /etc/testfile
This is more efficient than specifying 3 seperate commands:
$ chmod 400 /tmp/file1
This same technique works for almost any command that deals with files such as: ls, chown, chgrp, chown, grep, cat, which, tail, etc.
Understanding the concept by understanding wildcards
Most UNIX/Linux admins know that you can do a command such as chmod 400 * with the * wildcard. When you do this, what the shell is actually doing is changing the * wildcard in to a list of the file names in the directory before it ever even runs the chmod command. When the shell runs chmod the shell provides it the list of files in the directory as an argument to chmod. So when you do "chmod 400 *" the chmod command doesn't know that you specified a * wildcard. All it is aware of is the list of file names that the shell has provided it as arguments. This can be illustrated by running this command:
$ echo ls -al *
You can see when I ran the command echo ls -al * the shell translated the * in to the list of file names. So when you run ls -al *, the command that is actually being run in the end is ls -al file1 file2 file3 file4 file5 file6
Moving and copying files
If you need to move or copy multiple files in to a directory you can easily do it in a single command. For example, if you wanted to back up several files in to the /tmp directory you can run a command such as cp /etc/passwd /etc/group /etc/resolv.conf /tmp This will copy these 3 files in to the /tmp directory.
When you stop and think about it this makes perfect sense. If you were to run a command such as mv * /tmp as we have already covered what the shell is doing is changing the * wildcard in to the list of file names in the dircetory before ever calling mv. Here is another example showing this by using the echo command to show what is really being run:
$ echo mv * /tmp
Killing processes and editing files with vi
The kill command supports specify multiple pids on a single command line, i.e: kill 3419 456 532
Even the vi command supports specify multiple files to edit on a single command line: vi file1 file2 file3 Once in vi you can run ":n" to move to the next file
But Don't go Overboard
One thing to keep in mind is that there is a limit to how long your command line can be. If you start running in to this limit you need to start looking at utilities such as xargs which can help.
Many people are not aware that IBM included SMIT functionality in the VIO system. The VIO command is named "cfgassist" and it is designed to be used from the padmin restricted shell. When you run "cfgassist" the VIO server is running the AIX command "smitty vios_top" under the covers. So when you run cfgassist you will see a very easy to use menu just like smit in AIX.
If you haven't already tried the VIO "cfgassist" I would highly recommend giving it a try. It can be a big time saver.
Here are a few screenshots of some of the cfgassist funcitonality:
|Modified on by brian_s|
SMIT is one of the nicest features of AIX. It provides an extremely easy to use interface to manage almost every aspect of an AIX system.
One of the only problems with SMIT is that it is so inclusive and complete that the menu system is huge and it can be hard to find things in the menu. There have been times where I was in SMIT and found a really useful menu, but then the next time I went to use that same menu I couldn't find it again!
I started playing around with the idea of writing a script to read the ODM and display a "tree-view" of the entire SMIT menu/sub-menus. When I got it to work, I was shocked at the size of the SMIT menu system and everything it includes. SMIT really is an incredible tool and includes a TON of useful stuff in it.
This script will show every menu/sub-menu in a tree view. It also shows the fast-path to get directly to each menu. The fast-path is a SMIT "shortcut". For example, to get to the SMIT filesystems menu directly from the command line you can use the "fs" fast-path by typing "smit fs" on the command line.
Here is a screenshot of the first few lines of the output of the script. On my system the total output of the script showing the complete SMIT menu is 4,805 lines!
The text in the brackets "[ ]" is the fast-path to get directly to that menu.
You can either "grep" the output of the script to search for something in the SMIT menu system, or you could redirect the output to a file, and then open the file with your favorite text editor and search through it to find parts of SMIT you are looking for.
It is also very educational to just look through the output. Just by glancing through it I've found several things SMIT can do that I wasn't aware of.
Here is the script. It can be run as a regular user and the only external command it runs is "ODMDIR=/usr/lib/objrepos; odmget sm_menu_opt"
Here is a short script to show a "tree" view of Etherchannel devices on AIX. It shows the devices that make up the etherchannel, including the backup device if there is one:
Here is the script:
If you like this script, you might also like these posts: Show tree view of AIX device classes and subclasses and also Show AIX device dependency tree
Happy April fools day everyone...
Warning: Do *NOT* try this; this WILL get you fired or worse! I was playing around (on my home server) with the "rendev" AIX command that allows you to rename devices. This got me thinking that you could pull a pretty good April fools day prank with this one by renaming a bunch of devices and then calling over a veteran AIX admin and watch them try to figure out what's going on.... How long would it take you to figure out a system in this state?:
Hopefully this made someone smile :)
Hope everyone has a great day.
Normally when you do an POWER system firmware update you have the following options for where to install the updates from:
I've always been interested in the "Hard drive" repository option. However, there isn't very much information out there on how to use a "Hard drive repository".
The main question I had when I started looking in to the hard drive repository was: How do I get firmware in to the hard drive repository in the first place? The answer is kind of surprising... To get firmware in to the hard drive repository you need to first update the firmware on a server using one of the other options (FTP, removable media, IBM service web site). When you do the update using any of these methods, the firmware is also placed in to the hard drive repository.
This kind of creates a chicken before the egg situation... If you are interested in doing the "hard drive" repository option, it is likely because the FTP, Removable media, and IBM service web site options are not good options for you. So if the only way to use the hard drive option is to first use one of these other options you might have a problem.
Because of this, I never really used the "hard drive repository" option and other people I have talked to didn't really use it either.
However, I recently found a IBM Technote titled Updating Server Firmware Using a "SSH Repository" that explained how to copy firmware on to the HMC hard drive using SSH/SCP, and then update the firmware from that hard drive location using the HMC command line interface. The only thing you need to do this is SSH access to the HMC. You don't need any other external connectivity to or from the HMC or removable media. This is an excellent method to update firmware on a server and the Technote has good infromation on how to do it.
Here is an overview of the steps:
You first download the firmware updates from IBM Fix central.
You then copy them to the HMC hard drive. When I did this, I created a new directory in the HMC users home directory, and then copied the files in using "scp". Note that the HMC doesn't support SFTP and some scp clients try to use SFTP which won't work. For example, if using PuTTY's pscp, use the "-scp" option on it to force scp mode. I recommend creating a new directory for each version of firmware that you will store on your HMC. The end result should be a directory that looks something like this with the firmware you downloaded and transferred over:
You can then use the "lslic" command and point to the directory to verify the HMC recognizes the firmware. In this example the "418" level of firmware is confirmed to be available in this directory and is listed as a disruptive update. Note that in this example I am updating within the same release, so "retrievable_release" shows "none".
You can then use the "updlic" command line to actually do the update which will also reboot the server in this case since it is a disruptive update. This will take quite some time and there is no status/progress shown on the screen, so be patient!
The "-o a" means we are doing an update within the same release level.
You would use "-o u" if you were going to do an upgrade to a new release level.
The "-m" specifies the name of the managed system to update ("p520" in this example). The "-t sys" means update the system firmware. The "-l latest" tells it to update to the latest version in the specified directory (again, I recommend keeping only 1 firmware level per directory, so the latest version should be the only version in the directory). "-r mountpoint -d /home/hscroot/01SF240_418_382" specifies the directory where the firmware can be found.
In summary, this is an excellent way to do firmware upgrades on servers without requiring any other connectivity to the HMC other than SSH.
If you've ever tried to "grep" through a Korn shell history file (.sh_history) you've noticed that it doesn't work.. Here is an example showing what happens when you try to do this. I am trying to grep for "rm" in the shell history file. You can "cat" the file and see the content, but when you grep it directly, or pipe cat to grep, nothing but blank lines appear:
The reason why this doesn't work is the .sh_history file is not just a text file, it has some extra non-printable characters in it. You can see some evidence of this in the weird characters shown on the first line of the file when it was displayed with cat. Also, if you do a "cat -v .sh_history" you can see this unprintable characters that are getting in the way of grep:
Also, if you run "file .sh_history" it will report as a "Ultrix-11 Stand-alone or boot executable". So the .sh_history file clearly has extra non-printable characters in it that are messing up our attempt to grep the file.
The solution to this is to use the "strings" utility. strings will look through a binary file and display sections of printable characters out of it. So we can run "strings .sh_history | grep rm" if we want to search through the .sh_history file for "rm"
However, there is one small problem with this... By default "strings" will only print sequences of 4 or more printable characters. So if you ran the command "ls" and then ran "strings .sh_history" it wouldn't show the "ls" line because it is less than 4 characters. The solution to this is to run "strings -n 1 .sh_history | grep rm" which instructs "strings" to display all printable characters and thus every line in your .sh_history file will be searched by grep, regardless of length. In our example it doesn't make a difference since we don't have any commands less than 4 characters, but I just wanted to mention it in case anyone ever wonders why really short commands didn't show up with just the regular "strings .sh_history" command.
Linux has a very handy utility named "watch". It is a very useful command that will repeatedly run the same command over and over at a specified interval (by default, 2 seconds) and clear the screen in between each run. This is very useful when you want to "watch" something like what is in a directory (to watch files growing or changing), or to "watch" processes running (like "ps -ef | grep someuser").
For example, on Linux you can run:
And your screen will update every 2 seconds with a list of processes that contain "someuser".
AIX includes a command named "watch", however this command is completely unrelated to the Linux watch functionality. On AIX, watch is a command related to the auditing subsystem.
Fortunately, it is easy to write a one-line script that provides the basic functionality of the Linux "watch" on AIX:
This simple one-liner creates an infinite loop (while true); displays the date/time, runs whatever command you want to repeat (in this example, ps -ef | grep someuser), then sleeps for 1 second (you can change this to whatever amount you would like), and then clears the screen and repeats. If you want to exit the loop, just hit CTRL-C. This one line implements the basic functionality of the Linux watch command.
The native Linux watch command provides some additional functionality and optional fancy features like highlighting the text that changes between screen refreshes. If you want a fully functional watch command on AIX, check out Michael Perzl's awesome website of open source packages for AIX, which includes "watch": http://www.perzl.org/aix/index.php?n=Main.Watch
If you are not familiar with "expect", it is a script/programming language that is designed to automate interactive processes. For example, suppose you need to install a piece of software on many UNIX/Linux servers. The installation program runs from the command line, but must be run interactively. When run, it interactively asks the user for several pieces of information, and then installs. With expect, you could write a script to automate this task so that when the installation program prompts for information expect supplies the information, essentially simulating a user and automating the task.
Expect allows you to turn a normally interactive-only process in to a completely non-interactive, automated task.
The author of the expect language, Don Libes, wrote the definitive book on expect. The book is called "Exploring Expect" by O'Reilly. I would highly recommend the book to anyone wanting to learn expect.
Here is a great picture from the "Exploring Expect" book that sums up the power of Expect:
In this posting I will cover when you should use expect and when you should avoid it.
When approaching a problem, my first rule of thumb when it comes to expect is to avoid using expect unless it is the only solution. Why? Expect is an amazing tool, but can be complicated to use and very fragile. The basic premise of expect is to set it up to look for certain strings of text ("expect"ing them), and when it sees certain text, respond in a certain way. For example, you could write a script that expects the text "Specify directory to install application to: " and when it sees this, type back in "/opt/software/" However, if in the next version of the software the text of the installer changes slightly (i.e. "Specify directory location to install application to: ") , your expect program will no longer see what it is looking for and will fail to work. If there is a different way to get something done other than using an expect script then it is usually a better option in my experience. It would be possible to write an expect program to automate opening vi, editing the file, and then saving and exiting vi. But it would be much easier and more reliable to use a tool such as sed, awk, perl, etc. to edit the file. Make sure you are using the right tool for the job.
My second rule of thumb is to not use expect to automate tasks that deal with passwords. When you look around for expect information and examples, a lot of them deal with automating things like SSH, SFTP, SCP, etc. For example, the Wikipedia page on Expect lists 4 example scripts for using expect. They all deal with automating logging in to a service with password authentication (telnet, ftp, sftp, and ssh). I would not recommend using expect to automate anything like this. There is a very good reason why tools like SFTP and SSH don't allow you to script using a password without a tool like expect: It is a very bad idea! The first problem is you generally need to include the clear text password in your expect script. If anyone gets access to the script, they have access to the password as well. The second big problem when using expect with passwords is the risk that expect will "type" in the password at the wrong time. For example, if you have a expect script setup to expect certain things, and then send the password, and something goes wrong, the script might send the password too early or too late. The password might then end up somewhere inappropriate or visible to other users or in a shell history file. The bottom line is, if you need to automate running remote commands or copying files around, use SSH keys. SSH keys are so much safer than passwords for a variety of reasons, and there are several things you can do to make them a good option for automated tasks. If you MUST use expect to automate a password related task, one method to help the situation would be to have the expect script prompt you for the password when beginning the task and have the user type it in each time. This way the password is not stored in the script file.
Another password related example of what NOT to do with expect: automate changing passwords. Expect even includes an example script with it named "passmass" that will change your password on multiple servers. From a security perspective I think this is a really, really bad idea for the reasons I specified in the previous paragraph. The right tool for this kind of job is the "chpasswd" command. The chpasswd utility even allows you to specify a password hash ("encrypted" password) when setting a users password to make it more secure to script. chpasswd isn't perfect, but in my opinion it is a much better option than expect when it comes to automating changing passwords.
So when should you consider using expect? You should think about expect anytime you have a manual task that needs to be repeated and that only provides a interactive interface to the user. We already covered the example of a interactive software installation program. Another example is any propriety software that forces you to go through a text based menu to do something. Using expect, you could write a script to navigate the menu and automate the task.
Another extremely good way to use expect is when you need to automate an appliance or other closed system that doesn't have the ability to be scripted. To do this, you use expect on a Linux/UNIX machine to connect to the appliance or closed system, and then complete a task. For example, you could write an expect script that would connect to a Cisco switch and run a series of commands on the switch.
Expect is also a good option when creating test cases. If you need to routinely test software functionality then expect might make your life easier.
You can also use expect to not fully automate tasks, but just assist with manual tasks. This is because expect allows you to partially automate tasks while still allowing parts to be manual completed by a real person. An example of this is the AIX command "mkdvd" which burns mksysb images on to a DVD. When you run this command it writes the first DVD, and then if needed it will prompt you to insert additional DVD's. With expect, you could write a script that would email you or page you whenever it was time to put in the next DVD or when the mkdvd command was completed. This script needs to be customized with 2 command lines to email you/page you.
This mkdvd expect script helps with a manual process and this is not something you could do without a tool like expect.
Please post comments with some creative examples of when you have used expect, or horror stories of other people using expect when they shouldn't have :)
AIX controls cron access with the cron.allow and cron.deny files. This posting will take a look in to how these files work and some potential issues you might run in to.
The cron.allow and cron.deny files are located under the /var/adm/cron directory. In each file you list one user name per line.
There are two main approaches that you can take to restricting access to cron. You can use the cron.allow file, which works as a whitelist of users who are allowed to use cron. Anyone listed in cron.allow is permitted to use cron, and anyone not explicitly listed in cron.allow is not permitted to use cron. If the cron.allow file exists then it will enable this whitelist mode.
The other option is using the cron.deny file which works as a blacklist of users who are not allowed to use cron. All users that are not listed in cron.deny are permitted to use cron (unless a cron.allow file exists, in which case only users listed in cron.allow will be able to use cron.)
By default when you install AIX there is just an empty cron.deny file (no cron.allow file exists by default). This essentially allows all users access to cron by default.
This all seems pretty straight forward, but there are a couple of potential issues or unexpected results that you might run in to when working with cron access control:
If neither the cron.allow or cron.deny files exist: Then only the root user will have access to crontab. Other users will not be able to run crontab, however any of their existing cron jobs will still be run by cron (not what I expected).
If a user is listed in both cron.allow and cron.deny: The user will still have access to crontab and their cron jobs will continue to run (not what I would have expected). This also goes against what is in the AIX documentation.
If a user is added to cron.deny or removed from cron.allow after they already had cron jobs setup: Their existing cron jobs will no longer be run by cron until cron.allow/cron.deny are changed to permit them access, at which time their cron jobs will start running again.
If the cron.allow exists and root is not listed in it: You can and will lock root out of cron. Always list root in the cron.allow file if it exists.
If a user account is locked (account_locked=true): This has no effect on cron. The user's cron jobs will continue to run
If cron.allow/cron.deny are correctly setup to give the user cron access but they are still denied access to cron: Check the users "daemon" attribute (lsuser -a daemon <user>). If daemon is set to false for a user, they will not be able to use cron. This setting can be changed by running "chuser daemon=true <user>"
Obviously some of this behavior might differ between different AIX releases/updates, so you will want to do your own testing on your system to verify.
Please post a comment if you have run in to other unexpected or weird issues with cron access control.
AIX stores password hashes under /etc/security/passwd. Each user with a password defined will have a stanza in this file that specifies what the hashed password is.
Here is an example for the root user:
If you would like to transfer a users password from one server to another, you can simply copy the users stanza out of /etc/security/passwd and put it in this same file on the other server (replacing their existing stanza). The user will now be able to login to the other server with whatever their password was set to on the original server.
However, editing /etc/security/passwd directly should make you nervous. If you make a mistake with this file you might prevent anyone from being able to login to the server.
Another option is to get the users password hash out of /etc/security/passwd and then use the "chpasswd" command to change the password on the other server. The chpasswd has a "-e" option that specifies the password is hashed/encrypted rather than cleartext. So with the example given above for the root user, if we were to run this command on the other server it would update the root accounts password to be the same as on the original server:
The "-c" option on chpasswd clears any password flags and prevents the user from being forced to change their password at next login.
WARNING: Don't run the chpasswd command line above (or the others in this article) on your server or you'll change your root password. The password hashes used in this posting are just examples and the password for them all is just the letter "a".
To make this process easier, here is a short script to automate this process:
The script will generate the "chpasswd" command line needed to duplicate the users password on other servers. The script doesn't do anything other than generating the chpasswd command line - you must then take this command line and run it on whatever server(s) you want to copy the users password hash to. If you run this script with a specific user as a argument only that user will have the command line generated. If you don't specify a user, it will generate command lines for all users on the server that have a password stanza.
Here is an example of running it and specifying a user (root in this case). As you can see it just generates a command line - you must then copy and paste this on to each server you want to duplicate the users password on to. When you run the generated command on another server it will change the users password to match whatever password was set on the original server.
If you run the script without any arguments, chpasswd command lines are generated for all users that have a password stanza:
If you would like to learn more about password hashes and how they work, check out this article over at IBM System Magazine: Improve AIX Security With Password Hashes