Replicate IBM Z data pattern

Replicate data in real time by capturing change log activity to drive changes in the target. Enable newer applications to access a broader range of data stores and access methods through replication.

Overview

← Back to Application modernization patterns

The replication of system-of-record (SOR) data by software processes to create an alternative, typically remote, copy of that data is a specialization of a broader copy use case. That use case includes extract, transform, and load (ETL) processes, unload and load utilities, and hardware (disk) replication. As opposed to other methods, the specialization that software replication provides is two-fold. Software replication adds both near real-time replication with recovery point objectives (RPOs) near zero and continuous availability with recovery time objectives (RTOs) near zero. While this pattern is applicable to SOR data on any platform, its use with SOR data on IBM® z/OS® is especially relevant and valuable because many large enterprises use z/OS as an SOR.

Many business drivers exist for replicating SOR data in real time. It’s important to note that replication is often deployed as a component of a larger solution. Most customers have “roll your own” infrastructure support, and a subset deploy packaged solutions. These use cases are common:

The need for mission-critical, workload-level continuous availability at distance to support both planned and unplanned outages
The need for a real-time data warehouse to provide accelerated inquiry processing, offloading the SOR
The need for advanced analytics processing on real-time data by using various modern analytics platforms such as Hadoop and Apache Spark
The need to support various application modernization use cases both in the cloud and on premises
The need to support distributed query use cases such as CQRS, offloading inquiry from the SOR
The need to support the aggregation of data from multiple sources into an operational data store (ODS) or a data lake
The need to support the offload of the SOR from directly supporting multiple consumers, evolving consumers, or both by replicating into a data lake

You can implement these drivers by using either homogeneous replication (like models) or heterogeneous replication (unlike models). The following diagram is a simplified depiction of a replication setup from a source DBMS to either a target DBMS, a publish/subscribe such as Kafka, or a file system:

Solution and pattern for IBM Z®

Continuous availability is usually a homogeneous replication use case. When the SOR is IBM® Db2® for z/OS®, IMS, or VSAM, the target system is Db2 for z/OS, IMS, or VSAM. Optionally, you use IBM Z® GDPS Continuous Availability, which augments replication with a centralized control plane, intelligent routing, automation, and monitoring. You can also develop these higher-level constructs over replication by using alternative, often home-grown, methods.

The other business drivers are heterogeneous in nature. You can usually deploy the target systems and DBMS or file systems for those drivers in Linux® on IBM Z®, z/OS Container Extensions, or on IBM Z as an alternative to running a few components on distributed systems. An “all on IBM Z” approach has the advantage of providing better high availability characteristics, reliability, security, and management. Performance is also better, as fewer availability zones and network devices need to be traversed.

IBM has a family of replication products that are referred to as IBM® InfoSphere® Data Replication and IBM® Data Replication. This family includes multiple technologies, including QRep, CDC, and Classic products. For more information, see the InfoSphere Data Replication documentation.

Advantages

The use of software replication to create one or more copies of SOR data provides several key advantages:

Software replication is a near real-time synchronization method. Average latency for online processes, which is sometimes referred to as transactional, is usually measured in low seconds if not subseconds. The average latencies for batch processes are also low, but are higher than online processes. Batch latencies are usually measured in minutes. Those latencies meet many companies’ SLAs for RPO. Other techniques are either periodic pulls of all data, such as once a day, or replication to auxiliary storage devices that must be varied online for use.
Software replication captures changes to source data as they occur and replicates those changes to target agents that replay those changes directly through the DBMS or file system stack. This process enables the RPO to become the RTO, which means that you can achieve RTOs of low seconds to subseconds from any change.
Software replication can run at any granularity and tends to be focused on the sets of data for an application or a workload. You can implement custom configurations and processing at the workload level. In contrast, other solutions such as disk replication tend to be site-level in nature and constrained by the proper usage at the disk volume level.
Software replication processes data at transaction boundaries and can maintain target copies to be transactionally consistent with the source. This advantage isn’t possible with alternatives such as ETL or disk replication.
Costs are optimized by offloading queries and analytics processing to the target systems. This behavior reduces the strain on the SOR but still achieves CPU-intensive goals.
You can handle large volumes in terms of throughput of processing and still maintain low latency.
The impact of spiky and unpredictable workloads to core SORs is mitigated.

Considerations

When you implement software replication from SOR data, you need to consider a few factors. First, real-time replication requires supplemental logging. Replicating products read change records from DBMS recovery logs. As their name implies, those logs are designed for use by DBMS recovery processes and contain content that is sometimes tailored for those purposes. Replication, unlike recovery, requires more attributes and transactional semantics, such as commits, rollbacks, and operations that are identified by their transaction ID, before images for updates, open and close events, and load utility events.

Augmenting recovery logs to add in those attributes increases the volume of the changed data to the logs, which often affects log switch procedures, and the impact to the source application response time. You can mitigate some of the impact to the source application response time by using good buffering and offloading designs. In addition, log retention periods are often increased for replication use cases to allow for earlier restarts. Adding supplemental logging might require more capacity and can affect log switch times.

Real-time replication can be CPU intensive. The reading of logs, merging of change records, transactional buffering, refreshes (select *), transmission, and parallel replay all affect CPU. Software replication is often compared unfavorably to disk replication, its older sibling, as disk replication often is done at the control unit, off-platform. Adding replication might require more capacity to achieve the best performance as measured by latency.

Real-time replication provides a tight synchronization between source and target data. Effectively, the target copy is an extension of the source copy and must be considered for any source-side change management, batch processing, and utility processing such as reorgs. You might be tempted to treat the copies that replication maintains as being loosely coupled. However, in most use cases, many data-centric source procedures must be extended to consider the effect on the copy.

What's next

Review these related patterns:
For use cases in real-time replication of SOR data, see the Data Replication documentation.
For more information about IBM Z GDPS Continuous Availability, see GDPS.

Contributors

Paul Cadarette
STSM, Data Replication, IBM Master Inventor IBM

Greg Vance
STSM, IMS Development IBM