How does IBM Multi-site Workload Lifeline (Lifeline) enable continuous availability?

Lifeline monitors the workload applications and the systems where these applications reside, across the two sysplexes, or sites, where these systems are running. Lifeline controls the routing of connections and MQ messages that are targeting these workload applications, ensuring the connections and MQ messages are sent to the optimal workload applications in the active site(s).
If a workload failure in the active site is detected by Lifeline, Lifeline can automatically perform a workload switch, in seconds, to the workload applications in the alternate site. Or Lifeline can generate alert messages that automation products can capture to perform their own workload switching.

Does my business need continuous availability for workloads?

If your business meets one of the following situations, continuous availability for your workloads is needed.

  • Your business must run 24x7 due to industry regulations.
  • Other businesses are dependent on your business’ always-on availability, for example, if your business is in the financial and insurance industries.
  • Your business has no recovery procedures in place, for example, with non-sysplex environments and no disk replication capabilities

How is continuous availability different from disaster recovery?

Existing disaster recovery solutions utilize disk-based replication to make mirror copies to a remote site of all disks used by the systems in the local site. These disk copies cannot be used while the disk replication is occurring. In the event of a failure in the local site, the systems and workload applications need to be restarted in the remote site before access to the workloads is reestablished. Typically, this can take an hour or longer to accomplish.
With Lifeline-enabled continuous availability solutions, software data replication, such as InfoSphere Data Replication for Db2, is used to keep data in sync between the local and remote sites. The key difference is that systems in both sites are active, and Lifeline is used to monitor the workloads across both sites. In the event of a failure in the local site, Lifeline will detect the workload failure and route all new workload connections to the alternate site. So access to the workloads is re-established in seconds, versus the hour or more with disaster recovery solutions.

How does Lifeline act as an integral part of the GDPS® Continuous Availability solution?

Lifeline, through its monitoring and workload routing, plays an integral role in the GDPS Continuous Availability solution and provides the following benefits:

  • Improved performance: New connections of workloads are routed to the applications, servers, and systems most capable of processing them so that transaction response time is reduced. System resources are used more efficiently.
  • Improved availability: New connections of workload can be routed to available applications and systems when some of them down. Outages for maintenance updates or other planned events can be minimized.
  • Reduced recovery time: Reduce Recovery Time Objective from hours to minutes. With disk replication, traditional DR solutions recover on standby site by restarting systems or applications. Normally that takes hours and IT services are out for this period. With Lifeline working within the GDPS Continuous Availability solution, workload can be switched to the standby site in minutes.

Is Lifeline only available as part of the GDPS Continuous Availability solution?

No. Although typically used as an integral part of the GDPS Continuous Availability solution, Lifeline can also be deployed outside the solution.
If your business has your own automation capabilities, Lifeline, along with a software data replication product to keep data in both sites in sync, can be used.
In other cases, if your business has workload applications that are not sysplex-enabled, you cannot use the GDPS Continuous Availability solution. Using Lifeline, along with a software data replication produce to keep data in both sites in sync, will provide “sysplex-like” recovery for these workload types.

How does Lifeline reduce the maintenance window for planned outages?

Lifeline provides the ability to perform a graceful switch of the applications and their data sources, called workloads by Lifeline, during planned outages.  By using simple Lifeline commands, workload migration from one site to another can be easily performed, minimizing the down time for planned events such as scheduled maintenance activities.

How does Lifeline provide near continuous availability for critical workloads during unplanned outages?

Lifeline increases availability as new connections and messages can routed away from failing workload applications and systems. Lifeline reduces response times by routing connections and messages to workload applications and systems with capacity for additional work and reduces recovery time from hours to minutes.

Do all workloads running in a site need to be configured to Lifeline initially?

No. One of the many benefits of Lifeline is that it is not an all or nothing solution, like disaster recovery solutions tend to be. Only the most critical workloads would be configured to Lifeline to provide continuous availability, while all other workloads, including batch, would be recovered using existing disaster recovery procedures. And additional workloads can be added to Lifeline at any time.

What are the characteristics of a workload, when defining it to Lifeline?

A workload’s characteristics is dependent on the workload type. For TCP-based workloads, it’s the IP addresses and port numbers of the TCP applications. For SNA-based workloads, it’s the SNA appl names of the SNA applications. For MQ-based workload, it’s the MQ cluster queues and MQ queue managers where MQ messages for the workloads are sent. For Db2 DRDA-based workloads, it’s the IP addresses and port numbers of the Db2 aliases and Db2 subsystems. For Linux on Z workloads, it’s the Linux on Z guests running on zVM.

How does Lifeline control routing of connections to workload applications?

Lifeline relies on a load balancer that supports the Server/Application State Protocol, or SASP, that is documented in RFC 4678. The protocol allows Lifeline to periodically send routing recommendations to a SASP-enabled load balancer, directing the load balancer on how to route workload connections across a set of workload applications that can span both sites. The F5 Big-IP Switch Local Traffic Manager is the recommended load balancer for use with Lifeline.

How does Lifeline control routing of MQ messages for workloads?

Lifeline communicates with the MQ queue managers that manage the queues used by the workloads and directs the MQ cluster on which MQ queue managers are eligible to receive MQ messages. Following a workload failure in a site, Lifeline also ensures any stranded MQ messages are transferred to MQ managers in the alternate site during a workload switch.