IBM Support

Announcing IBM Big Replicate 2.1.2 - Hadoop Dev

Technical Blog Post


Abstract

Announcing IBM Big Replicate 2.1.2 - Hadoop Dev

Body

We are excited to announce the GA of IBM Big Replicate 2.1.2.

What is IBM Big Replicate?

IBM Big Replicate ensures business continuity by providing active-active data replication capability for your Big Data environment. Big Replicate enables data replication for Hadoop and cloud object stores with data consistency.

What’s new with this release?

1. New Platform Supported

Big Replicate 2.1.2 has added support for the following new platforms since Big Replicate 2.1.1

    -CDH 5.12
    -CDH 5.11
    -HDP 2.6.2

2. Notable New Features

This release includes the following major new features.

Big Replicate Kernel and Performance

Extensive improvements have been made in the core engine that is now referred to as Fusion Kernel.
Testing that spans a variety of load types shows throughput improvements and memory requirements reduced. Users can expect benefits ranging from 40% to 75% compared to previous releases.

Replication Memberships

Replication rule creation is simplified by the removal of the membership concept.

Memberships were used in previous versions of IBM Big Replicate has been replaced by simpler priority selection among zones, and the ability to control specific Big Replicate server roles in each zone. Memberships no longer need to be created, and there is no need to remove memberships that may no longer be in use by replication rules.

Non-Blocking Consistency Check

Consistency checks provide a mechanism to determine if there are any differences in the state of content within the scope of a replication rule. In earlier versions, during a consistency check, no change could be made via Big Replicate to the content being checked to ensure that the results of the check remain valid.
This release introduces an alternative, non-blocking consistency check that allows information on consistency state to be determined without blocking other activity while the check is underway. It takes advantage of tracking the state of changes to content under check during execution, and produces information for each item checked that covers the states: consistent, not-consistent, potentially inconsistent.

Bulk Replication Rules

Multiple replication rules can be created at the same time when they share attributes other than file system location.

Sidelining

IBM Big Replicate versions before 2.1.2 included a feature called “sidelining”. This allowed Big Replicate nodes that had fallen behind the agreement processing being performed among the network of nodes to a configurable degree to be sidelined, such that would no longer participate in agreement processing. The benefit of this approach was to ensure the overall health of a network under memory-constrained conditions, where the slow processing speed of an individual node was prevented from halting progress of the entire network.

A sidelined node required an intrusive process (“unsidelining”) to bring it back into the network to continue processing agreements.

IBM Big Replicate 2.1.2 supports operation in a manner that eliminates the potential for sidelining when nodes exceed memory constraints for agreement processing.

Logging

IBM Big Replicate logging has been changed. Where the Big Replicate server logged information to a set of rolling log files in /var/log/fusion/server named fusion-dcone.log., that information is now logged to files in the same location that are timestamped on creation. e.g. fusion-server.log.2017-10-06T12:22:53.

Big Replicate client logging is disabled by default and can be re-enabled through the Settings > Log Settings view.

Non-Coordinated Notification of File Content

This version of IBM Big Replicate does not use coordinated activities to communicate information among zones about the availability of new file content. This removes a significant portion of communication through the Big Replicate Kernel related to progress in writing file content and reduces the overall load on the coordination engine as a result.

Broader HDFS API Support

The set of HDFS API methods that were not previously coordinated by Big Replicate is extended with the inclusion of support for:

public void concat(Path trg, Path[] psrcs)

public boolean mkdir(Path f, FsPermission permission)

public FSDataOutputStream append(Path f, final EnumSet flag, final int bufferSize, final Progressable progress)

public FSDataOutputStream create(Path f, FsPermission permission, EnumSet flags, int bufferSize, short replication, long blockSize, Progressable progress)

public FSDataOutputStream create(Path f, FsPermission permission, EnumSet flags, int bufferSize, short replication, long blockSize, Progressable progress, final Options.ChecksumOpt checksumOpt)

public HdfsDataOutputStream create(final Path f, final FsPermission permission, final boolean overwrite, final int bufferSize, final short replication, final long blockSize, final Progressable progress, final InetSocketAddress[] favoredNodes)

public void rename(Path src, Path dst, Rename…​ options)

Technical documentation can be found in IBM Knowledge Center.

Big Replicate 2.1.2 is available for download from Passport Advantage and Passport Advantage Express website.

Learn more by visiting us

[{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSCRJT","label":"IBM Db2 Big SQL"},"Component":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

UID

ibm16260141