Using IBM Control Center to identify a failed component
You can use IBM® Control Center to isolate a component causing a data center failure.
Procedure
To understand the cause of your data center failure by using IBM Control Center, complete the following steps:
- Access the IBM Control Center web console dashboard.
- View the status of the Global Mailbox data centers in the Environmental health widget. Data centers display a status of Warning, Active, or Down.
- Select Servers > Individual servers.
- Click Global Mailbox within the navigation pane to view an overview of all the Global Mailbox data centers or click one of the data centers.
- Click the Data center view icon in the Individual servers banner.
- Click the line that connects a server and its service to view details about their connection. When you click a line that shows an error state, you can view the details about the error to troubleshoot issues.
-
Use the following table to determine which kind of failure you are experiencing:
Table 1. Identifying a failure in IBM Control Center What you see in IBM Control Center Failure What it means A red circle representing one of the data centers in the Environmental health widget
One of the data centers is experiencing a total outage.
The failed data center cannot be used to upload or download any files by partners. The surviving data center can be used to upload and download files, but no files can be processed.
Traffic must be rerouted to the surviving data center until the problem is corrected.
See Complete data center loss for more information about this scenario.
A red line connecting a Global Mailbox server to the data center within the Data center view
A split-brain scenario occurred.
File uploads are successful, but no files are being processed.
The two data centers must be configured to upload and process files individually until the problem is resolved.
See Troubleshooting split-brain after a network failure for more information about this scenario.
A notification with the error code, GMST00001E, within IBM Control Center at the time of the failure
A red line connecting a Global Mailbox server to the storage service within the Data center view
A status of Down within the Server component - storage page
A storage failure occurred.
File uploads to one data center are successful, but files are not processing in either data center.
Traffic must be rerouted to the surviving data center until the problem is corrected.
See Payload storage causing a data center failure for more information about this scenario.
A red circle on the Global Mailbox server within the Data center view
Global Mailbox admin nodes are experiencing an outage in one of the data centers.
Files can be uploaded to both data centers, but no files are being processed.
Traffic must be rerouted to the surviving data center until the problem is corrected.
See Global Mailbox administrator nodes causing a data center failure for more information about this scenario.
A red circle on Global Mailbox present in Sterling B2B Integrator within the Data center view
Sterling B2B Integrator nodes are experiencing an outage in one of the data centers.
File uploads to one data center stop while uploads to the other data center continue.
File processing continues in both data centers.
See Sterling B2B Integrator nodes causing a data center failure for more information about this scenario.
A notification with the error code, GMCAS0001E, within IBM Control Center at the time of the failure
A red line connecting a Global Mailbox server to the Cassandra service within the Data center view
A status of Down within the Server component - Cassandra page
Less than two Cassandra nodes are operating within one of the data centers.
No files are being processed.
Traffic must be rerouted to the surviving data center until the problem is corrected.
See Cassandra quorum causing a data center failure for more information about this scenario.
A notification with the error code, GMZOO0001E, within IBM Control Center at the time of the failure
A red line connecting a Global Mailbox server to the ZooKeeper service within the Data center view
A status of Down within the Server component - ZooKeeper page
At least two ZooKeeper nodes are experiencing outages and only one server is functioning.
File uploads to both data centers pause for several minutes but resume after the Watchdog configuration is complete.
See ZooKeeper nodes causing a data center failure for more information about this scenario.
A notification with the error code, GMWMQ0001E, within IBM Control Center at the time of the failure
A red line connecting a Global Mailbox server to the WebSphere® MQ service within the Data center view
A status of Down within the Server component - mq page
Both the active and standby WebSphere MQ managers are experiencing an outage.
Files continue to upload to both data centers, but events can only be processed in the surviving data center.
See WebSphere MQ managers causing a data center failure for more information about this scenario.