Network outages in Cloud Pak System multi-system deployments

Review the effects that network outages could have on your Cloud Pak System System multi-system deployment.

What happens if network connectivity is lost in your multi-system deployment? The answer to this question depends on the variety of network communications that are taking place.

There are four different network endpoints involved in multi-system deployment on the Cloud Pak System System:
  • A. The virtual machine data addresses (on network interface en1/eth1)
  • B. The virtual machine management addresses (on network interface en0/eth0)
  • C. Management addresses on the Cloud Pak System
  • D. The iSCSI tiebreaker address
Between these addresses, there are five different network interactions that take place. Connectivity failures in or between these networks result in different outcomes:
Communication between all of the virtual machines on the deployment over their data addresses, [A to A]
What happens if this communication is broken depends on the application being deployed. It might be application-to-deployment manager traffic, or application-to-database traffic. Depending on the application, if this communication is broken, then the application may not be available. For example, if you have deployed GPFS mirrors across two sites, and the data communication is severed, GPFS will still be available in one site provided that it can still access its GPFS tiebreaker. If you have deployed a WAS cluster using this GPFS mirror, the WAS custom nodes that can connect to the surviving GPFS mirror will still be able to function provided that they can access their database.
Management communications between the virtual machines, [B to B]
Refer to Management communications between the virtual machines and the system, [B to C] for effect on communications between virtual machines.
Management communications between the virtual machines and the system, [B to C]
These communications are used to keep the Cloud Pak System System UI up-to-date with the status of the system. If these communications are broken, then the application is not affected, but some of the virtual machines may have unknown status in the UI. Scaling the deployment will not be possible if [B to C] communications are broken on both racks.
Communication between the systems, [C to C]
If neither system can communicate with the iSCSI tiebreaker [C to D], then externally managed deployments on both systems are frozen (no deploys, no deletes, no scaling).

If one system can communicate with the iSCSI tiebreaker [C to D], then external deployments are not frozen on that system but are frozen on the other system.

If both systems can communicate with the iSCSI tiebreaker [C to D], then external deployments are not frozen on one system (unpredictable) but are frozen on the other system.

Communication between the systems and the tiebreaker, [C to D]
If the systems can communicate with each other [C to C], then the tiebreaker communication is just a failsafe mechanism and it is harmless for it to experience an outage. However, if there is a double failure of communication between the systems [C to C] and also to the tiebreaker [C to D], then externally managed deployments on both systems will be frozen (no deploys, no delete, no scaling) as indicated above.