Script to show if your AIX HBA / hdisk settings are actually in effect
brian_s 270002K5X3 Comments (10) Visits (23582)
There are several storage related settings in AIX that cannot be changed if the device is active. These include "fast_fail" , Dynamic tracking (dyntrk), and the "num_cmd_elems" for HBA's and the Queue Depth for hdisks.
Your options to set these are either make the device inactive (usually by taking redundant paths offline) and then make the change, or to use the "-P" flag on chdev and then reboot the server to make the change effective at the next boot.
The "-P" option on chdev has one major drawback however. As soon as you make the change with chdev "-P" it appears that the setting is active right away even before the reboot. If you check with "lsattr" it will appear as if the setting has taken effect. However it actually won't take effect until the next reboot. What has essentially taken place is that the running configuration is out of sync with the ODM. The ODM reflects the updated settings, however they can't be changed in the running configuration of the AIX kernel until the next reboot.
I've actually had discussions with other people who insisted that changing something like fast_fail with the chdev "-P" didn't require a reboot because they checked "lsattr" and it showed it had been changed.
Needless to say this can cause some serious confusion and other issues. The fact is that if you don't know the history of a server and who's worked on it you really can't trust the output of "lsattr" when looking at things like fast_fail, dyntrk, num_cmd_elems, and queue depth.
Chris Gibson did some excellent postings in the past on how to manually use kdb to see if these types of settings have been changed with chdev "-P" but the server hasn't been rebooted for them to actually take effect:
These are excellent posts on how to manually check this, but unfortunately it is not an easy task to do as you have to go in to KDB and run a couple of commands and then decode the cryptic output and do some hex to decimal conversions.
I took the process that Chris Gibson blogged about and automated it through a couple of scripts.
The first script checks all the HBA's on your system and will show you if the fast_fail, Dynamic Tracking (dyntrk), or num_cmd_elems is out of sync with the running configuration:
Here is the output when everything is in sync between the ODM and the running configuration:
Here is the output when one of the settings is out of sync (in this example the fastfail setting). This shows that the setting was changed with chdev "-P" but the server was never rebooted:
The second script checks the Queue Depth settings for each hdisk. Here is the output when everything is in sync between the ODM and the running configuration:
Here it is when the settings are out of sync. This shows that the Queue Depth's were changed with chdev "-P" but the server was never rebooted:
Here is the script to check the HBA settings:
Here is the script to check the hdisk Queue Depth's:
Note that these scripts use "kdb" which always makes me a little nervous. Please test the scripts out in your environment on a test server first. Also note that these scripts will more than likely not work on AIX 5.3 or older systems.
If you liked these scripts, you might also like "prdiff". It will show the differences between your LPAR's saved profile and its running configuration. For more information on it, see the project website at: http