IBM Cluster Health Check
Table of Contents
- How to get CHC
- Using CHC
The IBM Cluster Health Check tools (CHC) is an extensible framework and collection of tools to check and verify the health of an IBM Cluster.
At this point, the toolset is made available on a best-can-do support basis with the intention of getting input from users regarding its usefulness.
The framework has the following characteristics:
- Display available tools in an organized fashion
- Organize and display results of the tools with as much or as little information as the user requests
- Configuration file driven to allow for customization and extension
- Customize tool execution order through the use of configuration files
- Extensible - you can add your own tools and make them available under CHC by updating configuration files and using key environment variables from CHC
The tools cover the following broad areas:
- Node health and configuration consistency, and test tools
- InfiniBand fabric health and configuration consistency, and test tools to verify health of the fabric
- Basic configuration checking for consistency across a group of nodes and devices, as well as against a baseline
While a good number of the initial tools will work in both the x86 and POWER solutions, the initial toolset concentrates on x86 solutions. There is nothing in the initial framework that precludes integrating tools that are geared toward the POWER solution.
The latest Users Manual is provided as an attachment here. For other versions, see the Documentation section.
Upcoming: Nothing new, yet.
|6/11/2014||IBMCHC 18.104.22.168.0 is available and has been distributed to those you have requested the package previously. A new manual is attached to this page, too.|
|1/15/2014||Added links referencing cluster health check topics,|
xCAT must be installed and operational for CHC to work. CHC relies on xdcp and xdsh and other commands having access to nodes and devices.
At this point, The CHC package is supported on a best-can-do basis.
Before downloading, read the License.
Currently, the CHC package is available only upon request to the team. This allows the team to track who has the package and to more easily establish a dialog with the first users.
The following sub-sections address the user needs once he has obtained the CHC package. For the most part, these just guide you to documents or other pages to address your concerns.
Version 22.214.171.124.0 of the Users Manual (circa June 11, 2014)
Version 126.96.36.199.0 of the Users Manual (circa January 3, 2014)
Because the IBM CHC team recognizes that health checks and customer needs are constantly evolving, feedback is very important to the team. Every effort will be made to address concerns and suggestions in a timely fashion.
To this end, the following links are provided and organized areas of feedback. Make your best guess regarding which area your comments best apply. If the team believes these should be addressed elsewhere, proper guidance will be given.
- CHC Framework Feedback
- Node Health Feedback
- Node Test Feedback
- Fabric Health Feedback
- Fabric Tests Feedback
- Configuration Checking Feedback
- Utilities Feedback
- Developing new tools
- For feedback on this page and subpages, please use the comment facility.
The following are links to other cluster health check topics and documents:
Draft with a final version forthcoming
A redbook covering HPC Cluster Health Check topics. IBM CHC is used as a companion to discussing various topics and as an example.
|Dino Quintero's Rebook Blog||
Dino is an IBM Senior Certified IT Specialist with the ITSO. His Blog announces redbooks and other related topics.
|IBM HPC Cluster Health Check - an IBM Redbooks Solution Guide||Introduction to cluster health checking, Includes overview topics from the Redbook draft.|
If you know of other useful links, please add a comment to this page with the link and description so that the team may review and add it here. Thank you!