IBM Multi-site Workload Lifeline (Lifeline) enables intelligent load balancing of critical workload transactions by influencing routing of connections for TCP/IP workloads and messages for IBM MQ cluster workloads. Routing is done across two sites to provide near continuous availability.
When an outage occurs, IBM Multi-site Workload Lifeline helps reduce critical workload recovery time versus traditional disaster recovery from hours to minutes. The recovery time for unplanned outages is reduced by detecting workload failures and rerouting to another site. The impact of planned outages is mitigated by switching workloads to another site with minimal disruption.
New connections of workloads are routed to the applications, servers, and systems most capable of processing so that transaction response time is reduced. System resources are used more efficiently.
Route new workload connections to other available applications in the event of application, system or site outages. Outages for maintenance updates or other planned events can be minimized.
Add application instances on-demand. Automatically monitor and include added instances in workload routing decisions.
Reduce response time by aligning new workload connections with the most capable applications and systems. Recovery time after a workload failure can be reduced from hours to minutes.
Route workloads from one site to another with minimal disruption. Connections for query workloads can be distributed to both sites simultaneously.
Add simpler, non-disruptive testing of disaster recovery procedures by validating that workloads remain accessible on the recovery site – without requiring an outage of the production site.
Lifeline uses two tiers of load balancing for workloads targeting TCP/IP applications. Lifeline directs first-tier load balancers to route workload connections to second-tier load balancers in the selected site, which then route the connections to applications in the site. Lifeline relies on IBM MQ clusters for workloads by using messaging. Lifeline directs the cluster to route workload messages to IBM MQ queue managers in the selected site, which then make the messages available to applications.
For workloads that use two tiers of load balancers, Lifeline provides first-tier load balancers with site connection routing recommendations based on the availability and health of the workload applications, the z/OS systems and (if applicable) Linux® on IBM Z® systems across both sites. For workloads that use IBM MQ clusters, Lifeline provides the cluster with site message routing recommendations based on the availability and health of the IBM MQ queue managers and the z/OS systems across both sites.
A Lifeline Agent is started on each z/OS system and Linux on Z Management Guest where the workloads are present across both sites. The Agent is responsible for monitoring the workload applications that reside on its system and reporting this information back to a Lifeline Advisor. The Agent on z/OS is also responsible for communicating with an MQ queue manager to monitor and influence MQ message routing within an MQ cluster.
A Lifeline Advisor is started on a z/OS system and can be started as the primary or secondary Advisor. A primary Advisor communicates with all Lifeline Agents to determine workload availability. The Advisor provides MQ message distribution rules to the Agents for the MQ clusters and routing recommendations to load balancers for TCP connections for these workloads. A secondary Advisor monitors the availability of the primary Advisor and will take over primary Advisor responsibility in the event of a primary Advisor failure.
Each workload that is configured to Multi-site Workload Lifeline is classified as an "active/standby" or "active/query" workload.
Lifeline can support many types of workloads that reside on z/OS or Linux on Z:
Software requirements
Hardware requirements
See how IBM Multi-site Workload Lifeline plays a key role in solving major problems in the enterprise.
See how you can have intelligent load balancing of TCP/IP workloads and still have nearly continuous availability.
Read use cases that describe the integration of IBM Multi-site Workload Lifeline with F5 BIG-IP.
Learn to convert an existing IBM MQ environment with shared channels to a cluster and how to configure Lifeline to support a workload that uses an MQ cluster.
See how Lifeline helps your business to save costs and be competitive 24x7.
See how Lifeline plays an itegral role in the GDPS Continuous Availaiblity solution.
Secure platform for developing and sharing mainframe workloads.
Automate mainframe tasks and disaster recovery to achieve resiliency goals.
Support big data integration and consolidation, warehousing and analytics initiatives at scale with log-based change data capture with transactional integrity.
Lifeline monitors the workload applications and the systems where these applications reside, across the two sysplexes, or sites, where these systems are running. Lifeline controls the routing of connections and MQ messages that are targeting these workload applications, ensuring the connections and MQ messages are sent to the optimal workload applications in the active site(s).
If a workload failure in the active site is detected by Lifeline, Lifeline can automatically perform a workload switch, in seconds, to the workload applications in the alternate site. Or Lifeline can generate alert messages that automation products can capture to perform their own workload switching.
If your business meets one of the following situations, continuous availability for your workloads is needed.
Existing disaster recovery solutions utilize disk-based replication to make mirror copies to a remote site of all disks used by the systems in the local site. These disk copies cannot be used while the disk replication is occurring. In the event of a failure in the local site, the systems and workload applications need to be restarted in the remote site before access to the workloads is reestablished. Typically, this can take an hour or longer to accomplish.
With Lifeline-enabled continuous availability solutions, software data replication, such as InfoSphere Data Replication for Db2, is used to keep data in sync between the local and remote sites. The key difference is that systems in both sites are active, and Lifeline is used to monitor the workloads across both sites. In the event of a failure in the local site, Lifeline will detect the workload failure and route all new workload connections to the alternate site. So access to the workloads is re-established in seconds, versus the hour or more with disaster recovery solutions.
Lifeline, through its monitoring and workload routing, plays an integral role in the GDPS Continuous Availability solution and provides the following benefits:
No. Although typically used as an integral part of the GDPS Continuous Availability solution, Lifeline can also be deployed outside the solution.
If your business has your own automation capabilities, Lifeline, along with a software data replication product to keep data in both sites in sync, can be used.
In other cases, if your business has workload applications that are not sysplex-enabled, you cannot use the GDPS Continuous Availability solution. Using Lifeline, along with a software data replication produce to keep data in both sites in sync, will provide “sysplex-like” recovery for these workload types.
Lifeline provides the ability to perform a graceful switch of the applications and their data sources, called workloads by Lifeline, during planned outages. By using simple Lifeline commands, workload migration from one site to another can be easily performed, minimizing the down time for planned events such as scheduled maintenance activities.
Lifeline increases availability as new connections and messages can routed away from failing workload applications and systems. Lifeline reduces response times by routing connections and messages to workload applications and systems with capacity for additional work and reduces recovery time from hours to minutes.
No. One of the many benefits of Lifeline is that it is not an all or nothing solution, like disaster recovery solutions tend to be. Only the most critical workloads would be configured to Lifeline to provide continuous availability, while all other workloads, including batch, would be recovered using existing disaster recovery procedures. And additional workloads can be added to Lifeline at any time.
A workload’s characteristics is dependent on the workload type. For TCP-based workloads, it’s the IP addresses and port numbers of the TCP applications. For SNA-based workloads, it’s the SNA appl names of the SNA applications. For MQ-based workload, it’s the MQ cluster queues and MQ queue managers where MQ messages for the workloads are sent. For Db2 DRDA-based workloads, it’s the IP addresses and port numbers of the Db2 aliases and Db2 subsystems. For Linux on Z workloads, it’s the Linux on Z guests running on zVM.
Lifeline relies on a load balancer that supports the Server/Application State Protocol, or SASP, that is documented in RFC 4678. The protocol allows Lifeline to periodically send routing recommendations to a SASP-enabled load balancer, directing the load balancer on how to route workload connections across a set of workload applications that can span both sites. The F5 Big-IP Switch Local Traffic Manager is the recommended load balancer for use with Lifeline.
Lifeline communicates with the MQ queue managers that manage the queues used by the workloads and directs the MQ cluster on which MQ queue managers are eligible to receive MQ messages. Following a workload failure in a site, Lifeline also ensures any stranded MQ messages are transferred to MQ managers in the alternate site during a workload switch.