IBM Multi-site Workload Lifeline

Enable continuous availability during an outage for business-critical workloads running on z/OS®

Read the IBM Redpaper

The Multi-site Workload Lifeline, also known as "Lifeline," is a software for monitoring and routing workloads. Its purpose is to balance critical workload transactions by distributing connections for TCP/IP workloads and messages for IBM® MQ cluster workloads across two sites, thereby ensuring near-continuous availability.

This product is available both as an independent offering and as part of the GDPS® Continuous Availability solution.

When an outage occurs, IBM Multi-site Workload Lifeline helps reduce critical workload recovery time versus traditional disaster recovery from hours to minutes. The recovery time for unplanned outages is reduced by detecting workload failures and rerouting to another site. The impact of planned outages is mitigated by switching workloads to another site with minimal disruption.

Lifeline supports these workload types:

  • TCP/IP based workloads

  • Linux on z Systems® workloads

  • SNA workloads

  • IBM MQ cluster workloads

  • Db2® sysplex routed workloads

What's new

Summary of recent updates for Lifeline V2.5

Support of remote command API and IBM MQ multicluster workloads

Benefits Improve performance

Route new connections of workloads to the applications, servers and systems that are most capable of processing to reduce transaction response time. System resources are used more efficiently.

Achieve higher availability

Route new workload connections to other available applications during an application, system or site outages. Outages for maintenance updates or other planned events can be minimized.

Reduce recovery time

Reduce response time by aligning new workload connections with the most capable applications and systems. Minimize recovery time from hours to minutes after a workload failure.

Increase scalability

Add application instances on-demand. Automatically monitor and include added instances in workload routing decisions.

Improve workload migration, usage

Route workloads from one site to another with minimal disruption. Connections for query workloads can be distributed to both sites simultaneously.

Simplify disaster recovery procedures

Add simpler, non-disruptive testing of disaster recovery procedures by validating that workloads remain accessible on the recovery site without requiring an outage of the production site.

IBM Multi-site Workload Lifeline can help us understand whether a site is normal and whether the data is synchronized. Only when IBM Multi-site Workload Lifeline is deployed, IBM GDPS Continuous Availability (GDPS CA) can complete the workload switching to achieve continuous availability. Senior manager of data center A large Asian bank
Features
Load balancing workloads

Lifeline uses two tiers of load balancing for workloads targeting TCP/IP applications. Lifeline directs first-tier load balancers to route workload connections to second-tier load balancers in the selected site, which then route the connections to applications in the site. Lifeline relies on IBM MQ clusters for workloads by using messaging. Lifeline directs the cluster to route workload messages to IBM MQ queue managers in the selected site, which then make the messages available to applications.

Explore external load balancers Explore internal load balancers
Site routing recommendations

For workloads that use two tiers of load balancers, Lifeline provides first-tier load balancers with site connection routing recommendations based on the availability and health of the workload applications, the z/OS systems and (if applicable) Linux® on IBM Z® systems across both sites. For workloads that use IBM MQ clusters, Lifeline provides the cluster with site message routing recommendations based on the availability and health of the IBM MQ queue managers and the z/OS systems across both sites.

Explore IBM MQ clusters
Lifeline Agents

A Lifeline Agent is started on each z/OS system and Linux on Z Management Guest where the workloads are present across both sites. The Agent is responsible for monitoring the workload applications that reside on its system and reporting this information back to a Lifeline Advisor. The Agent on z/OS is also responsible for communicating with an IBM MQ queue manager to monitor and influence IBM MQ message routing within an IBM MQ cluster.

Explore Lifeline Agents
Lifeline Advisors

A Lifeline Advisor is started on a z/OS system and can be started as the primary or secondary Advisor. A primary Advisor communicates with all Lifeline Agents to determine workload availability. The Advisor provides IBM MQ message distribution rules to the Agents for the IBM MQ clusters and routing recommendations to load balancers for TCP connections for these workloads. A secondary Advisor monitors the availability of the primary Advisor and will take over primary Advisor responsibility in the event of a primary Advisor failure.

Explore Lifeline Advisors
Workload configurations

Each workload that is configured to Multi-site Workload Lifeline is classified as an Active/Standby or Active/Query workload.

Active/Standby workload

 

An active/standby workload is active in one site. Lifeline directs load balancers and IBM MQ queue managers to route incoming connections and messages to the active site. When database updates are made, database replication software transmits those changes asynchronously from the active instance of the workload to its standby instance. At the standby site, the standby instance of the workload is active and ready to receive work. The updated data from the active site is applied to the database subsystem running in the standby site in near real time.

Active/Query workload

An active/query workload can be active in one or both sites. Lifeline provides routing recommendations to the load balancers to intelligently balance connections across both sites. Workloads using IBM MQ messages cannot be classified as active/query workloads. When database updates are made by the associated active/standby workload, database replication latency is monitored by Lifeline to ensure that connections are not routed to a site if the replicated database on that site contains data that is too out of date with the database on the active site.

Explore sample Multi-Site Workload Lifeline configurations
Technical details

In preparation for using Multi-site Workload Lifeline, you need to identify the workloads that you want to provide continuous availability and evaluate how these workloads’ applications are accessed.

Requirements
  • All z/OS systems should be at z/OS 2.5 or higher.
  • All Linux on z Systems guests must be at SUSE Linux Enterprise Server (SLES) 12 SP4 or higher or Red Hat® Enterprise Linux Server 7.9 or higher.
  • z/VM® hosting the Linux on z Systems guests must be at z/VM 6.4 or higher.
Review all other requirements
Resources Converting to an IBM MQ Cluster

Learn to convert an existing MQ environment with shared channels to a cluster and how to configure Lifeline to support a workload that uses an MQ cluster.

Integrating IBM Multi-site Workload Lifeline with F5 BIG-IP®

Read use cases that describe the integration of Lifeline with F5’s BIG-IP Local Traffic Manager.

Related products IBM z/OS Communications Server

Secure platform for developing and sharing mainframe workloads.

IBM GDPS

Automate mainframe tasks and disaster recovery to achieve resiliency goals.

IBM z/OS Parallel Sysplex®

Boost server communications with clustering technology that allows a set of up to 32 IBM z/OS systems to be connected and to behave as a single, logical computing platform.

IBM Data Replication

Support data integration and consolidation, warehousing and analytics initiatives at scale with log-based change data capture with transactional integrity.

Browse more resiliency on IBM Z

FAQs

How does IBM Multi-site Workload Lifeline enable continuous availability?

Lifeline monitors the workload applications and the systems where these applications reside, across the two sysplexes, or sites, where these systems are running. Lifeline controls the routing of connections and MQ messages that are targeting these workload applications, ensuring the connections and IBM MQ messages are sent to the optimal workload applications in the active site(s).
If a workload failure in the active site is detected by Lifeline, Lifeline can automatically perform a workload switch, in seconds, to the workload applications in the alternate site. Or Lifeline can generate alert messages that automation products can capture to perform their own workload switching.

Does my business need continuous availability for workloads?

If your business meets one of the following situations, continuous availability for your workloads is needed.

  • Your business must run 24x7 due to industry regulations.
  • Other businesses are dependent on your business’ always-on availability, for example, if your business is in the financial and insurance industries.
  • Your business has no recovery procedures in place, for example, with non-sysplex environments and no disk replication capabilities.
How is continuous availability different from disaster recovery?

Existing disaster recovery solutions utilize disk-based replication to make mirror copies to a remote site of all disks used by the systems in the local site. These disk copies cannot be used while the disk replication is occurring. In the event of a failure in the local site, the systems and workload applications need to be restarted in the remote site before access to the workloads is reestablished. Typically, this can take an hour or longer to accomplish.
With Lifeline-enabled continuous availability solutions, software data replication, such as InfoSphere Data Replication for Db2, is used to keep data in sync between the local and remote sites. The key difference is that systems in both sites are active, and Lifeline is used to monitor the workloads across both sites. In the event of a failure in the local site, Lifeline will detect the workload failure and route all new workload connections to the alternate site. So access to the workloads is re-established in seconds, versus the hour or more with disaster recovery solutions.

How does Lifeline act as an integral part of the GDPS® Continuous Availability solution?

Lifeline, through its monitoring and workload routing, plays an integral role in the GDPS Continuous Availability solution and provides the following benefits:

  • Improved performance: New connections of workloads are routed to the applications, servers, and systems most capable of processing them so that transaction response time is reduced. System resources are used more efficiently.
  • Improved availability: New connections of workload can be routed to available applications and systems when some of them down. Outages for maintenance updates or other planned events can be minimized.
  • Reduced recovery time: Reduce Recovery Time Objective from hours to minutes. With disk replication, traditional DR solutions recover on standby site by restarting systems or applications. Normally that takes hours and IT services are out for this period. With Lifeline working within the GDPS Continuous Availability solution, workload can be switched to the standby site in minutes.
Find out more
Is Lifeline only available as part of the GDPS Continuous Availability solution?

No. Although typically used as an integral part of the GDPS Continuous Availability solution, Lifeline can also be deployed outside the solution.
If your business has your own automation capabilities, Lifeline, along with a software data replication product to keep data in both sites in sync, can be used.
In other cases, if your business has workload applications that are not sysplex-enabled, you cannot use the GDPS Continuous Availability solution. Using Lifeline, along with a software data replication produce to keep data in both sites in sync, will provide “sysplex-like” recovery for these workload types.

Find out more
How does Lifeline reduce the maintenance window for planned outages?

Lifeline provides the ability to perform a graceful switch of the applications and their data sources, called workloads by Lifeline, during planned outages.  By using simple Lifeline commands, workload migration from one site to another can be easily performed, minimizing the down time for planned events such as scheduled maintenance activities.

Find out more
How does Lifeline provide near continuous availability for critical workloads during unplanned outages?

Lifeline increases availability as new connections and messages can routed away from failing workload applications and systems. Lifeline reduces response times by routing connections and messages to workload applications and systems with capacity for additional work and reduces recovery time from hours to minutes.

Do all workloads running in a site need to be configured to Lifeline initially?

No. One of the many benefits of Lifeline is that it is not an all or nothing solution, like disaster recovery solutions tend to be. Only the most critical workloads would be configured to Lifeline to provide continuous availability, while all other workloads, including batch, would be recovered using existing disaster recovery procedures. And additional workloads can be added to Lifeline at any time.

What are the characteristics of a workload when defining it to Lifeline?

A workload’s characteristics is dependent on the workload type. For TCP-based workloads, it’s the IP addresses and port numbers of the TCP applications. For SNA-based workloads, it’s the SNA appl names of the SNA applications. For IBM MQ-based workload, it’s the MQ cluster queues and MQ queue managers where IBM MQ messages for the workloads are sent. For Db2 DRDA-based workloads, it’s the IP addresses and port numbers of the Db2 aliases and Db2 subsystems. For Linux on Z workloads, it’s the Linux on Z guests running on zVM.

How does Lifeline control routing of connections to workload applications?

Lifeline relies on a load balancer that supports the Server/Application State Protocol, or SASP, that is documented in RFC 4678. The protocol allows Lifeline to periodically send routing recommendations to a SASP-enabled load balancer, directing the load balancer on how to route workload connections across a set of workload applications that can span both sites. The F5 Big-IP Switch Local Traffic Manager is the recommended load balancer for use with Lifeline.

How does Lifeline control routing of MQ messages for workloads?

Lifeline communicates with the MQ queue managers that manage the queues used by the workloads and directs the IBM MQ cluster on which MQ queue managers are eligible to receive IBM MQ messages. Following a workload failure in a site, Lifeline also ensures any stranded MQ messages are transferred to IBM MQ managers in the alternate site during a workload switch.

Next steps

Discover how Multi-site Workload Lifeline helps reduce critical workload recovery time when outage occurs. Schedule a no-cost 30-minute meeting with an IBM Z representative.  

More ways to explore Documentation Support IBM Redbooks Support and services Global financing Flexible pricing Education and training Community Developer community Business Partners Resources