Common cluster startup problems

This section discusses some common cluster startup problems.

  1. If a resource group will not stabilize on its primary node and site, it is probable that the PPRC Replicated Resource and Resource Group are defined to be primary on different sites. Refer to the Planning Primary and Secondary Site Layout for DSCLI-managed PPRC Replicated Resources in PowerHA® SystemMirror® Resource Groups. Make sure all the appropriate entries in the PPRC replicated resource are aligned in the correct direction, as described in this and the Sample Configuration section.
  2. Another reason a resource group may not come ONLINE, may be unstable, or go into ERROR state is if there is trouble with the PPRC instance or path that the volume group and PPRC replicated resource being managed requires. See the descriptions below for commands to use to display PPRC instance states and a description of normal working states.
  3. Keep in mind the limitation that a volume group needs to have the same volume major number across all cluster nodes. This has been known to cause instability in a resource group when it initially comes online.
  4. If the Pri-Sec and Sec-Pri port pairs are not declared correctly, this will lead to errors in creating the initial PPRC paths, which will keep the cluster from coming up correctly. In the /tmp/hacmp.out file, there will be an error message returned from the mkpprcpath call for the PPRC replicated resource associated with the resource group that will not come ONLINE.
  5. In certain instances, disk reserves can be dropped on vpaths or hdisks, that DSCLI management is not able to break. These will keep a resource group from being able to come ONLINE because the disk that has a reserve on it will never be write accessible. See Other Handy AIX® Commands below for ideas on how to break disk reserves.