Introducing high availability and automation for SAP

Business continuity of an SAP production system is a critical business factor. It requires the highest possible level of system availability. The solution is described in this document.

This high availability solution for SAP offers multiple advantages.

  • It combines high availability techniques with automation technologies of IBM Z System Automation product.
  • It helps to avoid unplanned outages by eliminating single points of failure.
  • It helps to avoid planned outages such as administrative or maintenance work.
  • It provides business continuity for an SAP production system as close as possible to 24 hours a day during 365 days a year.

IBM® Systems products incorporate various advanced autonomic computing capabilities that are based on the four characteristics of self-managing systems:

Self-configuring
The seamless integration of new hardware resources and the cooperative yielding of resources by the operating system is an important element of self-configuring systems. Hardware subsystems and resources can configure and reconfigure autonomously both at startup time and during run time. Based on the current optimization criteria, or in response to hardware or firmware faults, this action can be initiated by the need to adjust the allocation of resources. Self-configuring also includes the ability to concurrently add or remove hardware resources in response to commands from administrators, service personnel, or hardware resource management software.
Self-healing
With self-healing capabilities, operating systems can detect hardware and firmware faults instantly and then limit the effects of the faults within defined boundaries. These capabilities allow systems to recover from the negative effects of such faults with minimal or no impact on the execution of operating system and user-level workloads.
Self-optimizing
Self-optimizing capabilities allow computing systems to autonomously measure the performance or usage of resources and then tune the configuration of hardware resources to deliver improved performance.
Self-protecting
Self-protecting capabilities allow computing systems to protect against internal and external threats to the integrity and privacy of applications and data.

Since the announcement of SAP on Db2® for z/OS®, Db2 Parallel Sysplex® data sharing combined with Db2 connection failover has been used to remove the database server as a single point of failure. These features can help you to avoid planned and unplanned outages of the database server.

The high availability solution, which is presented in this document, further enhances business continuity by removing the SAP central instance as a single point of failure. This solution also provides a method to automate the management of all SAP components for planned and unplanned outages. Thus, high availability is achieved by combining the concepts of system automation and transparent failover in a Parallel Sysplex. Based on the IBM Z System Automation product, together with a redesign of the SAP central instance concept, this high availability solution exploits the SAP stand-alone enqueue server, the enqueue replication server, dynamic virtual IP addresses (VIPA), shared file system, and Db2 data sharing to aim for a minimum of SAP system outages together with maximum automation.

The implementation and customization of the complete high availability solution highly depends on the customer configuration and requires IBM Z System Automation skill. It is recommended that customers request support from IBM Global Services. Before customers go into production with their implementation of the solution, they should also contact SAP for a final check of the setup.

The high availability solution for SAP provides the means for fully automating the management of all SAP components and related products that are running on z/OS, AIX® , Linux®, or Windows. The automation software monitors all resources and controls the restart or takeover of failing components or both, thus ensuring almost continuous availability of the SAP system.

The availability of the enqueue server is critical for an SAP system. If it fails, most SAP transactions also fail. To address this single point of failure, SAP, in cooperation with IBM, changed the architecture of the enqueue server. It is no longer part of the so-called central instance. That is, it no longer runs inside a work process, but is now a stand-alone process, which is called the stand-alone enqueue server. It operates under the designation SAP Central Services, or SCS. The enqueue server transmits its replication data to an enqueue replication server, which runs on a different system. The enqueue replication server stores the replication data in a shadow enqueue table in shared memory. The SAP Community provides more information about this topic and about high availability. For more information about the SAP enqueue server and replication server, see SAP Central Services. Also, refer to the description of the SAP high availability architecture and the SAP Lock Concept, which can be found on the SAP NetWeaver documentation: SAP Lock Concept.

If the enqueue server fails, it is quickly restarted by IBM Z System Automation on the system where the replication server was running. It uses the replicated data in the shadow enqueue table to rebuild the tables and data structures. Thus, a failure of the enqueue server is not visible to the user and the SAP application. For a detailed description of this process, see Concepts for a high availability SAP solution.

The business continuity solution, which is described in this document, is derived from an SAP test environment that is the blueprint of implementing an almost continuously available SAP system on Db2 for z/OS.

The IBM Z System Automation product was chosen for the business continuity solution because it not only provides the means for the implementation of a high availability system, but also includes all features needed to streamline daily operations. For example, it includes features for automated start-up, shutdown, and monitoring of the components of an SAP system and its dependent products. Because of these capabilities, System Automation is also a prerequisite for Geographically Dispersed Parallel Sysplex (GDPS®). See GDPS infrastructure for disaster recovery.