IBM Cluster Health Check

This page has not been liked. Updated 6/12/14, 12:52 AM by MarkAtkinsTags: None

Table of Contents

 

{ dataSrc: "unbound", dataSrcObj: { noAutoLoad: true }, render: "tip", renderObj: { title: "Hint", icon: "greenCheckIcon.png", style: "border: 1px solid #9d9; border-radius: 6px; background-color: #dfd; margin 6px 6px 6px 40px; padding: 6px;", titleStyle: "font-size: 1.4em; margin-bottom: 4px;" }, obj: { body: "To get an email when pages are updated, you can click on following actions above  and choose to follow individual pages or the entire wiki.", title: "" }, id: "ii0" }
Hint  
To get an email when pages are updated, you can click on following actions above  and choose to follow individual pages or the entire wiki.  

 


Overview

The IBM Cluster Health Check tools (CHC) is an extensible framework and collection of tools to check and verify the health of an IBM Cluster.

At this point, the toolset is made available on a best-can-do support basis with the intention of getting input from users regarding its usefulness.

The framework has the following characteristics:

  • Display available tools in an organized fashion
  • Organize and display results of the tools with as much or as little information as the user requests
  • Configuration file driven to allow for customization and extension
  • Customize tool execution order through the use of configuration files
  • Extensible - you can add your own tools and make them available under CHC by updating configuration files and using key environment variables from CHC

The tools cover the following broad areas:

  • Node health and configuration consistency, and test tools
  • InfiniBand fabric health and configuration consistency, and test tools to verify health of the fabric
  • Basic configuration checking for consistency across a group of nodes and devices, as well as against a baseline
  • Utilities

While a good number of the initial tools will work in both the x86 and POWER solutions, the initial toolset concentrates on x86 solutions. There is nothing in the initial framework that precludes integrating tools that are geared toward the POWER solution.

Below you will find information on How to get CHC and Using CHC.

The latest Users Manual is provided as an attachment here. For other versions, see the Documentation section.

News

Upcoming: Nothing new, yet.

Date News
6/11/2014 IBMCHC 1.1.1.0.0 is available and has been distributed to those you have requested the package previously. A new manual is attached to this page, too.
1/15/2014  Added links referencing cluster health check topics,
12/20/2013
  • Announcing initial availability of IBM CHC
  • Given the date, response to initial requests may take longer until the new year.

 

 


How to get CHC

Pre-requisites

xCAT must be installed and operational for CHC to work. CHC relies on xdcp and xdsh and other commands having access to nodes and devices.

 

License

 

Review the license linked here before sending a request for the package.

At this point, The CHC package is supported on a best-can-do basis.

 

CHC package

Before downloading, read the License.

Currently, the CHC package is available only upon request to the team. This allows the team to track who has the package and to more easily establish a dialog with the first users.

Click here to send an email to request CHC.

 

 


Using CHC

The following sub-sections address the user needs once he has obtained the CHC package. For the most part, these just guide you to documents or other pages to address your concerns.

 

Documentation

 

Version 1.1.1.0.0 of the Users Manual (circa June 11, 2014)

Version 1.1.0.0.0 of the Users Manual (circa January 3, 2014)

 

Feedback

Because the IBM CHC team recognizes that health checks and customer needs are constantly evolving, feedback is very important to the team. Every effort will be made to address concerns and suggestions in a timely fashion.

To this end, the following links are provided and organized areas of feedback. Make your best guess regarding which area your comments best apply. If the team believes these should be addressed elsewhere, proper guidance will be given.

 


References

The following are links to other cluster health check topics and documents:

 

Health Check References
Link Description

IBM HPC Cluster Health Check Redbook Draft

Draft with a final version forthcoming

A redbook covering HPC Cluster Health Check topics. IBM CHC is used as a companion to discussing various topics and as an example.

Dino Quintero's Rebook Blog

Dino is an IBM Senior Certified IT Specialist with the ITSO. His Blog announces redbooks and other related topics.

IBM HPC Cluster Health Check - an IBM Redbooks Solution Guide Introduction to cluster health checking, Includes overview topics from the Redbook draft.

If you know of other useful links, please add a comment to this page with the link and description so that the team may review and add it here. Thank you!

 

Hint  

To get an email when pages are updated, you can click on following actions above  and choose to follow individual pages or the entire wiki.