1. Introduction
1.2. What is IBM Big Replicate?
IBM Big Replicate is a software application that allows Hadoop deployments to replicate HDFS data between Hadoop clusters that are running different, even incompatible versions of Hadoop. It is even possible to replicate between different vendor distributions and versions of Hadoop.
1.2.1. Benefits
-
Virtual File System for Hadoop, compatible with all Hadoop applications.
-
Single, virtual Namespace that integrates storage from different types of Hadoop, including CDH, HDP, EMC Isilon, Amazon S3/EMRFS and MapR.
-
Storage can be globally distributed.
-
WAN replication using the IBM Big Replicate LIVE DATA platform, delivering single-copy consistent HDFS data, replicated between far-flung data centers.
1.3. Using this guide
This guide describes how to install and administer IBM Big Replicate as part of a multi data center Hadoop deployment, using either on premises or cloud-based clusters. We break down the guide into the following three sections:
- Deployment Guide
-
Covers the various requirements for running IBM Big Replicate, in terms of hardware, software and environment. Reading and understanding these requirements help you to avoid deployment problems. Additionally, if you need to make changes on your platform, we strongly recommend that you re-check the Deployment Checklist.
Working in the Hadoop ecosystem covers any special requirements or limitations imposed when running IBM Big Replicate along with various Hadoop applications.
The Installation section covers on-premises deployments into data centers. See Cloud Installation for cloud or hybrid installations.
- Administration Guide
-
This section describes all the common actions and procedures that are required as part of managing IBM Big Replicate in a deployment. It covers how to work with the UI’s monitoring and management tools. Use the Administration Guide if you need to know how to do something.
- Reference Guide
-
This section describes the UI, systematically covering all screens and providing an explanation for what everything does. Use the Reference Guide if you need to check what something does on the UI, or gain a better understanding of IBM Big Replicate’s underlying architecture.
1.4. Admonitions
In the guide we highlight types of information using the following call outs:
The alert symbol highlights important information. |
The STOP symbol cautions you against doing something. |
Tips are principles or practices that you’ll benefit from knowing or using. |
The KB symbol shows where you can find more information, such as in our online Knowledge Center. |
1.5. Get support
See our online Knowledge Center which contains updates and more information.
We use terms that relate to the Hadoop ecosystem, IBM Big Replicate and WANdisco’s DconE replication technology. If you encounter any unfamiliar terms checkout the Glossary.
2. Release Notes
2.1. Release 2.1.2.2
16 April 2018
Big Replicate 2.1.2.2 includes new features, issue resolutions, platform support and other improvements. These release notes include details on the specific improvements and enhancements to the product, and should be read in conjunction with the product documentation.
2.1.1. Installation
The release can be installed with updates of the IHC server RPM, the Big Replicate server RPM and the client stack or package. e.g. The following packages should be updated for HDP 2.6.0:
fusion-hcfs-hdp-2.6.0-ihc-server-2.11.2.3.el6-xxxx.noarch.rpm
fusion-hcfs-hdp-2.6.0-server-2.11.2.3.el6-xxxx.noarch.rpm
fusion-hcfs-hdp-2.6.0-2.11.2.3.stack.tar.gz
Please contact IBM support for help with this process, and find details in the Knowledge Center, which contains updates and more information.
2.1.3. Highlighted Improvements
2.1.4. New Platform Support
Big Replicate has added support for the following new platforms since version 2.11.1.5:
-
CDH 5.14
Platform support for IBM Big Insights 4.0 has been removed from this release.
2.1.5. Available Packages
This release of Big Replicate supports the following versions of Hadoop:
-
ASF Apache hadoop 2.5.0 - 2.7.0
-
CDH 5.4.0 - CDH 5.14.0
-
HDP 2.1.0 - HDP 2.6.4
-
MapR 5.0.0 - MapR 5.2.0
-
IOP (IBM BigInsights) 4.2.5
2.1.6. System Requirements
Before installing, ensure that your systems, software and hardware meet the requirements found in our online user guide at Please contact IBM support for help with this process, and find detailed Checklist which contains updates and more information.
Certified Third-Party Components
IBM certifies the interoperability of Big Replicate with a wide variety of systems, including Hadoop distributions, object storage platforms, cloud environments, and applications. See the Support matrix for additional information.
Client Applications Supported
Big Replicate is architected for maximum compatibility and interoperability with applications that use standard Hadoop File System APIs. All applications that use the standard Hadoop Distributed File System API or any Hadoop-Compatible File System API should be interoperable with Big Replicate, and will be treated as supported applications. Additionally, Big Replicate supports the replication of content with Amazon S3 and S3-compatible objects stores, locally-mounted file systems, and NetApp NFS devices, but does not require or provide application compatibility libraries for these storage services.
2.1.7. Known Issues
Big Repicate 2.1.2.2 includes a small set of known issues with workarounds. In each case, resolution for the known issues is underway.
-
Big Replicate does not support truncate command -
WD-FUS-3022
The public boolean truncate(Path f, long newLength)
operation in
org.apache.hadoop.fs.FileSystem
(> 2.7.0) is not yet supported. Files will be
truncated only in the cluster where the operation is initiated. Consistency
check and repair can be used to both detect and resolve any resulting
inconsistencies.
-
Recursive parent directory creation with exclusions -
WD-FUS-4847
When an exclusion rule prevents the replication of specific files, applcations
that perform a mkdir()
operation than includes the creation of parent
directories will not create those parent directories. This may be an unexpected
outcome from the definition of that exclusion rule.
-
Failed deployment of DSM will block removal -
WD-FUS-3781
A DSM that fails to deploy properly on on node is not capable of participating in its own removal, and thus blocks removal.
-
Non-recursive OnTap repair repairs recursively -
WD-FUS-3932
,WD-FUS-3640
All subdirectories for an OnTap snapdiff repair are repaired when recursive it set to false.
2.1.8. Other Improvements
In addition to the highlighted features listed above, Big Replicate 2.1.2.2 includes the following improvements in general operation.
-
ADLS support for Azure -
WD-FUI-5610
-
Big Replicate installer client detection -
WD-FUI-5734
-
UI validation of secure transfer for Azure storage -
WD-FUI-5897
-
Client installation step improvement -
WD-FUI-5685
-
Ability to change memberships for rules -
WD-FUI-5823
-
Optimized file transfer query from UI -
WD-FUI-6001
-
Correct time shown for consistency status -
WD-FUI-6009
-
Support for CDH 5.14 -
WD-FUI-6011
,WD-FUS-5243
-
Hive CLI startup with
fs.hdfs.impl
-WD-FUS-4876
-
Bypass and healthy directories omitted from cleanup -
WD-FUS-5172
-
CompatibilityAdatpor
useApiCompatibility
classloader on proxy creation -WD-FUS-5173
-
Knox audit logging failure correction with Ranger plugin -
WD-FUS-3373
-
Classpaths added for Atlas -
WD-FUS-3845
-
Distribution upgrade compatibility with yum repository logic -
WD-FUS-4186
-
Client upgrade or removal symlink correction -
WD-FUS-4722
,WD-FUS-5130
-
Correct FileNotFound exceptions when FS cache is disabled -
WD-FUS-4579
-
Improved GSN synchronization following extended target zone downtime -
WD-FUS-4936
-
Fixed behavior of consistency check task data availability -
WD-FUS-5036
-
Talkback collects rpm_info for Debian installs -
WD-FUS-5041
-
ListObjectV1 compatibility with ListObjectV2 -
WD-FUS-5050
-
Consistency check completion time correction -
WD-FUS-5057
-
LocalFS Debian client installation -
WD-FUS-5065
-
renameWithOptions
support for native Azure file system -WD-FUS-5069
-
Solr for HDP distributions -
WD-FUS-5081
-
Compatibility with EsgynDB -
WD-FUS-5084
-
LocalFS Big Replicate installer no longer uses outdated AWS Java SDK -
WD-FUS-5098
-
wasbs://
underlying file system configuration -WD-FUS-5126
-
Update Kafka configuration setting for Ranger audit logging -
WD-FUS-5143
-
Talkback correction for HDP and Azure -
WD-FUS-5162
,WD-FUS-5153
-
Failed execution no retry for malformed setfacl operations -
WD-FUS-5155
-
Correct defaults for client
underlyingFsClass
-WD-FUS-5157
-
Improved stack checks on oozie processes -
WD-FUS-4129
2.2. Release 2.11.1.5 Build 1066
21 February 2018
IBM Big Replicate 2.11.1 is the first minor release following Big Replicate 2.11, and includes new features, issue resolutions, platform support, performance and usability improvements. These release notes include details on the specific improvements and enhancements to the product, and should be read in conjunction with the product documentation.
2.2.1. Installation
The release can be installed with updates of the IHC server RPM, the Big Replicate server RPM and the client stack or package. e.g. The following packages should be updated for HDP 2.6.0:
fusion-hcfs-hdp-2.6.0-ihc-server-2.11.1.4.el6-xxxx.noarch.rpm fusion-hcfs-hdp-2.6.0-server-2.11.1.4.el6-xxxx.noarch.rpm fusion-hcfs-hdp-2.6.0-2.11.1.4.stack.tar.gz
Please contact IBM support for help with this process, and find detailed Knowledge Center which contains updates and more information.
2.2.3. Highlighted Improvements
2.2.4. New Platform Support
IBM Big Replicate has added support for the following new platforms since Big Replicate 2.11:
-
CDH 5.13
-
HDP 2.6.3 - 2.6.4
2.2.5. Available Packages
This release of IBM Big Replicate supports the following versions of Hadoop:
-
ASF Apache hadoop 2.5.0 - 2.7.0
-
CDH 5.4.0 - CDH 5.13.0
-
HDP 2.1.0 - HDP 2.6.4
-
MapR 5.0.0 - MapR 5.2.0
-
IOP (IBM BigInsights) 4.0 - 4.2.5
The trial download includes the installation packages for CDH and HDP distributions only.
2.2.6. System Requirements
Before installing, ensure that your systems, software and hardware meet the requirements found in our online user guide at Please contact IBM support for help with this process, and find detailed Checklist which contains updates and more information.
Certified Third-Party Components
IBM certifies the interoperability of Big Replicate with a wide variety of systems, including Hadoop distributions, object storage platforms, cloud environments, and applications.
-
Amazon S3
-
Amazon EMR 4.0 - 5.4
-
Ambari 1.6, 1.7, 2.0, 2.1
-
CDH 5.4 - 5.13
-
EMC Isilon 7.2, 8.0
-
Google Cloud Storage
-
Google Cloud Dataproc
-
HDP 2.1.0 - 2.6.4
-
IBM BI 2.1.2 - 4.2.5
-
MapR M4.0.1 - M5.2.0
-
Microsoft Azure Blob Storage
-
Microsoft Azure HDInsights 3.2 - 3.6
-
MySQL, PostgreSQL (Hive Metastore)
-
Oracle BDA
Client Applications Supported
IBM Big Replicate is architected for maximum compatibility and interoperability with applications that use standard Hadoop File System APIs. All applications that use the standard Hadoop Distributed File System API or any Hadoop-Compatible File System API should be interoperable with IBM Big Replicate, and will be treated as supported applications. Additionally, Big Replicate supports the replication of content with Amazon S3 and S3-compatible objects stores, locally-mounted file systems, and NetApp NFS devices, but does not require or provide application compatibility libraries for these storage services.
2.2.7. Known Issues
Big Replicate 2.11.1 includes a small set of known issues with workarounds. In each case, resolution for the known issues is underway.
-
Big Replicate does not support truncate command -
WD-FUS-3022
The public boolean truncate(Path f, long newLength)
operation in
org.apache.hadoop.fs.FileSystem
(> 2.7.0) is not yet supported. Files will be
truncated only in the cluster where the operation is initiated. Consistency
check and repair can be used to both detect and resolve any resulting
inconsistencies.
-
Recursive parent directory creation with exclusions -
WD-FUS-4847
When an exclusion rule prevents the replication of specific files, applcations
that perform a mkdir()
operation than includes the creation of parent
directories will not create those parent directories. This may be an unexpected
outcome from the definition of that exclusion rule.
-
Failed deployment of DSM will block removal -
WD-FUS-3781
A DSM that fails to deploy properly on on node is not capable of participating in its own removal, and thus blocks removal.
-
Non-recursive OnTap repair repairs recursively -
WD-FUS-3932
,WD-FUS-3640
All subdirectories for an OnTap snapdiff repair are repaired when recursive it set to false.
-
[Azure deployments only] It’s not possible to set the replication exchange directory on the UI settings screen, any attempt will result in an error -
WD-FUI-5828
2.2.8. Other Improvements
In addition to the highlighted features listed above, Big Replicate 2.1.2.1 includes a wide set of improvements in performance, functionality, scale, interoperability and general operation.
-
Check ownership in Big Replicate handshake token process -
WD-FUS-4006
-
Correct ownership for parent directories in Big Replicate handshake token process -
WD-FUS-4079
-
S3 Plugin to not retry certificate errors -
WD-FUS-3917
,WD-FUS-4672
,WD-FUS-4675
-
Ignore
.fusion
in chown and chmod -R of top-level directory -WD-FUS-1964
-
Report total transfer size correctly -
WD-FUS-3218
,WD-FUS-4681
-
Improve plugin loading error handling -
WD-FUS-3244
,WD-FUS-4933
-
Make move across encryption zones NonRetriable -
WD-FUS-3630
-
Support replication between clusters with common
fs.defaultFS
configuration -WD-FUS-3683
,WD-FUS-1185
,WD-HIVE-110
-
Update WADL representation for global excluded properties -
WD-FUS-3742
,WD-FUS-3831
-
Workaround for HDFS-3545 -
WD-FUS-3853
-
Make AuthenticationException non-retriable -
WD-FUS-3865
-
Stalled transfer to report 0 bytes/s -
WD-FUS-3903
-
S3 plugin to not retry certificate errors -
WD-FUS-3917
-
Logging infrastructure change -
WD-FUS-3973
-
Log Jersey exceptions -
WD-FUS-4091
-
Support
s3a://
as an HCFS -WD-FUS-4114
-
Add Big Replicate jars to Spark classpath in CDH parcel -
WD-FUS-4130
-
Clean S3 buffer directory on startup -
WD-FUS-4770
,WD-FUS-4162
-
CDH Big Replicate Parcels updates -
WD-FUS-4170
(WD-FUS-4130
,WD-FUS-4134
,WD-FUS-4142
,WD-FUS-4152
,WD-FUS-4157
,WD-FUS-4245
,WD-FUS-4246
,WD-FUS-4490
,WD-FUS-4507
,WD-FUS-4508
,WD-FUS-4633
,WD-FUS-4757
) -
Improved consistency check metrics -
WD-FUS-4241
-
Improve bulk S3 object deletion -
WD-FUS-4293
-
UTC naming for
thread-dump
andtask-gc
files -WD-FUS-4295
,DCO-748
-
Configurable S3 ListObjectsRequest maxKeys -
WD-FUS-4300
-
Support multiple
fs.s3.buffer.dir
locations -WD-FUS-4303
-
Allow custom consistency check for locations that are not present in all zones -
WD-FUS-4346
-
Log entry for clean shutdown -
WD-FUS-4359
-
Ensure maximum of single HFlush in AgreedProposalStore -
WD-FUS-4419
-
Logging for HFlush handling -
WD-FUS-4444
-
TransferManager NPE correction -
WD-FUS-4512
-
Prevent creation of replication rule that matches a default exclusion -
WD-FUS-4563
-
Correct talkback feedback on failure -
WD-FUS-4595
-
API for summary statistics of execution dependencies -
WD-FUS-4647
,WD-FUS-3470
-
IHC server should fail to start if ports unavailable -
WD-FUS-4652
-
URISyntaxException should be non-retriable -
WD-FUS-4674
-
Safeguard against zero chunkSize -
WD-FUS-4677
-
Correct pull behavior for file systems that do not support appends -
WD-FUS-4805
,WD-FUS-4691
-
Improve client messaging on license expiry -
WD-FUS-4483
,WD-FUS-4693
-
File transfers to report whether total size is final -
WD-FUS-4736
-
Improve history management with KMS configuration -
WD-FUS-4763
-
Clean objects storage buffer directories on startup -
WD-FUS-4770
-
Services search for PID if PID file missing -
WD-FUS-4796
-
Avoid potential Rename failure retry -
WD-FUS-4813
-
Correct silent discards of replicated rename operations -
WD-FUS-4814
-
Improved display for transfer of renamed files -
WD-FUS-4820
,WD-FUS-4824
-
Correct percent remaining display -
WD-FUS-4821
-
Configurable early pull -
WD-FUS-4835
-
Cache checks on existence of bypass directory -
WD-FUS4894
-
Minimize internal serialization -
WD-FUS-4915
-
Correct transfer name handling on unrelated renames
WD-FUS-4928
-
Improve
JAVA_HOME
discovery -WD-FUS-4964
-
Talkbacks request support ticket number -
WD-FUS-3838
-
Consolidate constants across components -
WD-FUS-4181
-
Clean up stale
repl.dir.exchange
entries following server restart -WD-FUS-4413
-
Talkbacks include plugin information -
WD-FUS-4639
-
Correct file size display on completed file transfers -
WD-FUS-4668
-
Talkback consistency improvements -
WD-FUS-4669
-
Improved error handling on replicated directory info -
WD-FUS-4826
-
Support multiple URIs per DSM -
WD-FUS-4885
,WD-FUS-4886
-
RequestID in logs -
WD-FUS-4844
-
Do not run scheduled consistency check after directory removal -
WD-FUS-4877
-
service fusion-server-stop
return code correction -WD-FUS-4760
-
Correction to FINEST logging target -
WD-FUS-4779
-
Support for HDP 2.6.3 -
WD-FUS-4429
-
Support for HDP 2.6.4 -
WD-FUS-4904
,WD-FUI-5706
-
Simplify request event lifecycle -
WD-FUS-4766
,WD-FUS-4791
,WD-FUS-4792
,WD-FUS-4919
-
Support appends for ADLS -
WD-FUS-4878
-
S3 Plugin support for v1 and v2 object listing (
fs.fusion.s3.listing.method
) -WD-FUS-4859
,WD-FUS-5061
-
Recover from MVStore interruption -
WD-FUS-5023
,WD-FUS-5030
-
Replication of Sentry and Ranger policies -
WD-FUS-3956
,WD-FUS-4450
-
Document
removeAll
for DELETE API -WD-FUS-4953
-
Improve package status information -
WD-FUI-5779
-
Big Replicate client parcels for SLES 12 -
WD-FUI-5638
-
Correct
proxyuser
configuration at installation -WD-FUI-5671
-
Fix to retry button for Hive installation -
WD-FUI-5163
-
Handle plugin states -
WD-FUI-5544
-
Minimize runtime exception logging -
WD-FUI-5704
-
Represent manual fast bypass state in UI -
WD-FUI-4557
-
Deploy client statkc only to nodes with HDFS client -
WD-FUI-5312
-
Improve rule removal -
WD-FUI-5316
-
Allow throttle retry configuration for S3 plugin -
WD-FUI-5456
-
Correct addition of replicated directory for Azure -
WD-FUI-5564
-
Support
fs.azure.enable.append.support
-WD-FUI-5618
-
Better handle multiple replicated files in UI -
WD-FUI-5629
-
Handle default Ranger replicated directory -
WD-FUI-5639
-
Provide content type for .deb packages -
WD-FUI-5699
-
Set
fs.s3.consistent.throwExceptionOnInconsistency
andfs.s3.consistent.retryCount
for EMR -WD-FUI-5703
-
Correct installer client page -
WD-FUI-5754
-
Allow
transfer.chunk.size
UI setting -WD-FUI-5773
-
Improve handshake token validation -
WD-FUI-5803
-
Correct blank IHC network interface page -
WD-FUI-5367
2.3. Release 2.11.0.3 Build 991
26 January 2018
IBM Big Replicate 2.11.0.3 is a minor release for customers using 2.11.0.x versions of the product. It adds support for new platforms and addresses a small number of minor issues.
We advise all customers using IBM Big Replicate to apply this minor update to their environment.
2.3.1. Installation
IBM Big Replicate 2.11.0.3 can be installed as a full release, or with updates of the IHC server RPM, the Big Replicate server RPM and the client stack or package. e.g. the following packages should be updated for HDP 2.6.0:
fusion-hcfs-hdp-2.6.0-ihc-server-2.11.0.3.el6-xxxx.noarch.rpm fusion-hcfs-hdp-2.6.0-server-2.11.0.3.el6-xxxx.noarch.rpm fusion-hcfs-hdp-2.6.0-2.11.0.3.stack.tar.gz
Please contact IBM support for help with this process, and find detailed Knowledge Center which contains updates and more information.
2.3.2. Highlighted Improvements
FUS-4471 - Removing last replicated directory causes NPE
In previous versions, immediately adding a replication rule for a path that is a sub-directory of a removed replication rule could result in a null pointer exception causing the failure of a Big Replicate server. This is resolved in IBM Big Replicate 2.11.
2.3.4. New Platform Support
IBM Big Replicate has added support for the following new platforms since Big Replicate 2.1.2:
-
CDH 5.13
-
HDP 2.6.3
Support for CDH 5.2 and CDH 5.3 has been removed.
2.3.5. Available Packages
This release of IBM Big Replicate supports the following versions of Hadoop:
-
ASF Apache hadoop 2.5.0 - 2.7.0
-
CDH 5.4.0 - CDH 5.13.0
-
HDP 2.1.0 - HDP 2.6.3
-
MapR 5.0.0 - MapR 5.2.0
-
IOP (IBM BigInsights) 4.0 - 4.2.5
The trial download includes the installation packages for CDH and HDP distributions only.
2.3.6. System Requirements
Before installing, ensure that your systems, software and hardware meet the requirements found in our online user guide at Please contact IBM support for help with this process, and find detailed Checklist which contains updates and more information.
Certified Third-Party Components
IBM certifies the interoperability of Big Replicate with a wide variety of systems, including Hadoop distributions, object storage platforms, cloud environments, and applications.
-
Amazon S3
-
Amazon EMR 4.0 - 5.4
-
Ambari 1.6, 1.7, 2.0, 2.1
-
CDH 4.4, 5.2 - 5.13
-
EMC Isilon 7.2, 8.0
-
Google Cloud Storage
-
Google Cloud Dataproc
-
HDP 2.1.0 - 2.6.3
-
IBM BI 2.1.2 - 4.2.5
-
MapR M4.0.1 - M5.2.0
-
Microsoft Azure Blob Storage
-
Microsoft Azure HDInsights 3.2 - 3.6
-
MySQL, PostgreSQL (Hive Metastore)
-
Oracle BDA
Client Applications Supported
IBM Big Replicate is architected for maximum compatibility and interoperability with applications that use standard Hadoop File System APIs. All applications that use the standard Hadoop Distributed File System API or any Hadoop-Compatible File System API should be interoperable with IBM Big Replicate, and will be treated as supported applications. Additionally, Big Replicate supports the replication of content with Amazon S3 and S3-compatible objects stores, locally-mounted file systems, and NetApp NFS devices, but does not require or provide application compatibility libraries for these storage services.
2.3.7. Known Issues
IBM Big Replicate 2.11.0.3 includes a small set of known issues with workarounds. In each case, resolution for the known issues is underway.
-
IBM Big Replicate does not support truncate command - WD-FUS-3022
The public boolean truncate(Path f, long newLength)
operation in
org.apache.hadoop.fs.FileSystem
(> 2.7.0) is not yet supported. Files
will be truncated only in the cluster where the operation is initiated.
Consistency check and repair can be used to both detect and resolve any
resulting inconsistencies.
-
Recursive parent directory creation with exclusions - WD-FUS-4847
When an exclusion rule prevents the replication of specific files, applications
that perform a mkdir()
operation than includes the creation of parent
directories will not create those parent directories. This may be an unexpected
outcome from the definition of that exclusion rule.
2.3.8. Other Improvements
In addition to the highlighted features listed above, IBM Big Replicate 2.11.0.3 includes the following improvements.
-
Installer determination of Kerberos configuration -
WD-FUI-5568
-
Signature verification during install on SLES 12 -
WD-FUI-5571
-
Big Replicate installer
hadoop.proxyuser.$fusionuser.hosts
value -WD-FUI-5671
-
Improve browser cache handling to resolve installation steps -
WD-FUI-5680
-
Bypass and replicated exchange directory configuration for HDInsight -
WD-FUI-5556
-
Show priority zone when editing a rule -
WD-FUI-5626
-
Correct plugin status display for plugins that fail to start -
WD-FUI-5647
-
Improve YARN configuration changes for Big Replicate install -
WD-FUI-5653
-
More robust display of replication rules -
WD-FUI-5655
-
Correct Ambari stack download links -
WD-FUI-5689
-
Zone name display in title -
WD-FUI-5509
-
Correct Hive plugin status display -
WD-HIVE-757
-
Replicated directory addition fix in Azure -
WD-FUI-5564
-
fs.azure.enable.append.support
for Azure zones -WD-FUI-5618
-
Correct repair tab warnings for replication rules -
WD-FUI-5669
-
Content-Type for client downloads -
WD-FUI-4272
-
Improved recovery from significant outages -
WD-FUS-4851
-
Talkbacks should not duplicate fusion server logs -
WD-FUS-4866
-
Support HDP 2.6.3 -
WD-FUS-4429
-
Improve content replication on source zone process restarts -
WD-FUS-4797
-
Correct license update instructions -
WD-FUS-4869
-
Talkback corrections for Azure -
WD-FUS-4343
-
Corrections to NetApp and LocalFs repair -
WD-FUS-4840
-
Improve behavior of setPermissionRequest -
WD-FUS-4845
,WD-FUS-4846
-
LocalFs client packaging fix for Debian -
WD-FUS-4879
-
Per-DSM manual bypass script -
WD-FUS-4417', `WD-FUS-3861
-
Improved replicate Metastore operation retries -
WD-HIVE-781
-
Hive replication corrections -
WD-HIVE-786
2.4. Release 2.11.0.2 Build 778
6 December 2017
IBM is pleased to present IBM Big Replicate 2.11 as the next major release of the Big Replicate platform, available now from the IBM file distribution site. This release includes key new features, platform support, installation, scale, performance and usability improvements, and establishes the basis for further product extensibility.
2.4.1. Installation
Find detailed installation instructions in the user guide at Installation.
2.4.2. Security Fixes
Potential exposure to the following security issues is resolved with IBM Big Replicate 2.11.
2.4.3. Highlighted New Features
This release includes the following major new features.
Big Replicate Kernel and Performance
IBM Big Replicate leverages WANdisco’s Distributed Coordination Engine (DConE). This release is the first to take advantage of improvements made in the core engine that is now referred to as Big Replicate Kernel.
The most significant impact of the Big Replicate Kernel is improved overall product performance. IBM testing that spans a variety of load types shows throughput improvements and memory requirements reduced. You can expect benefits ranging from 40% to 75% compared to previous releases.
Replication Memberships
Replication rule creation is simplified by the removal of the membership concept.
Memberships were used in previous versions of IBM Big Replicate has been replaced by simpler priority selection among zones, and the ability to control specific Big Replicate server roles in each zone. Memberships no longer need to be created, and there is no need to remove memberships that may no longer be in use by replication rules.
Non-Blocking Consistency Check
Consistency checks provide a mechanism to determine if there are any differences in the state of content within the scope of a replication rule. In versions of Big Replicate prior to 2.11, during a consistency check, no change could be made via Big Replicate to the content being checked to ensure that the results of the check remain valid.
IBM Big Replicate 2.11 introduces an alternative, non-blocking consistency check that allows information on consistency state to be determined without blocking other activity while the check is underway. It takes advantage of tracking the state of changes to content under check during execution, and produces information for each item checked that covers the states: consistent, not-consistent, potentially inconsistent.
Bulk Replication Rules
Multiple replication rules can be created at the same time when they share attributes other than file system location.
Sidelining
IBM Big Replicate versions before 2.11 included a feature called "sidelining". This allowed Big Replicate nodes that had fallen behind the agreement processing being performed among the network of nodes to a configurable degree to be sidelined, such that would no longer participate in agreement processing. The benefit of this approach was to ensure the overall health of a network under memory-constrained conditions, where the slow processing speed of an individual node was prevented from halting progress of the entire network.
A sidelined node required an intrusive process ("unsidelining") to bring it back into the network to continue processing agreements.
IBM Big Replicate 2.11 supports operation in a manner that eliminates the potential for sidelining when nodes exceed memory constraints for agreement processing.
Logging
IBM Big Replicate logging has been changed. Where the Big Replicate server logged information
to a set of rolling log files in /var/log/fusion/server
named
fusion-dcone.log.<number>
, that information is now logged to files in the
same location that are timestamped on creation. e.g.
fusion-server.log.2017-10-06T12:22:53
.
Big Replicate client logging is disabled by default and can be re-enabled through the Settings > Log Settings view.
Non-Coordinated Notification of File Content
This version of IBM Big Replicate does not use coordinated activities to communicate information among zones about the availability of new file content. This removes a significant portion of communication through the Big Replicate Kernel related to progress in writing file content and reduces the overall load on the coordination engine as a result.
Broader HDFS API Support
The set of HDFS API methods that were not previously coordinated by Big Replicate is extended with the inclusion of support for:
-
public void concat(Path trg, Path[] psrcs)
-
public boolean mkdir(Path f, FsPermission permission)
-
public FSDataOutputStream append(Path f, final EnumSet<CreateFlag> flag, final int bufferSize, final Progressable progress)
-
public FSDataOutputStream create(Path f, FsPermission permission, EnumSet<CreateFlag> flags, int bufferSize, short replication, long blockSize, Progressable progress)
-
public FSDataOutputStream create(Path f, FsPermission permission, EnumSet<CreateFlag> flags, int bufferSize, short replication, long blockSize, Progressable progress, final Options.ChecksumOpt checksumOpt)
-
public HdfsDataOutputStream create(final Path f, final FsPermission permission, final boolean overwrite, final int bufferSize, final short replication, final long blockSize, final Progressable progress, final InetSocketAddress[] favoredNodes)
-
public void rename(Path src, Path dst, Rename… options)
2.4.4. Highlighted Improvements
FUS-4471 - Removing last replicated directory causes NPE
In previous versions, immediately adding a replication rule for a path that is a sub-directory of a removed replication rule could result in a null pointer exception causing the failure of a Big Replicate server. This is resolved in IBM Big Replicate 2.11.
FUS-3719 - User-Agent field in S3 requests
Object upload requests made to an AWS S3 endpoint now include identifying information that identifies the source as IBM Big Replicate 2.11.
FUS-3897 - Repair task improvements
Information about repair tasks in progress includes details on the repair’s source of truth, type and the timestamp of task completion.
FUS-3901 - New default exclusions
HDFS/HCFS replication rules now exclude .tmp
and .hive-staging
locations by
default.
FUS-3968 - Move fusion.username.translation out of core-site.xml
The fusion.username.translation
configuration property is now specified in
application.properties
rather than in core-site.xml
, allowing it to be
changed without impacting other cluster services.
FUS-4000 - UTC Timestamps
Log entries now include UTC-based time information rather than local timezone values.
FUS-4002 - Big Replicate to IHC connections use SO_REUSEADDR
The Big Replicate server can cope with a faster rate of connection recycling independently of the kernel settings.
FUS-3999 - Scheduled Consistency Check
Consistency checks can be scheduled in cron format. The
consistencyCheckPeriod
that is specified for a given replication rule is now
defined as a string with the form of a cron expression, e.g. 0 0 0/6 * * ?
FUS-4076 - Consistency Check results across nodes
The results of a consistency check are now available from any Big Replicate node, not just that which initiated the check or the writer node.
FUS-4202 - Visibility of ongoing transfers
The IHC server now exposes an endpoint to report on the status of ongoing transfers, improving visibility of transfer status across a deployment.
2.4.5. Known Issues Resolved
Previous known issues that are resolved in this release are:
Consistency repair tool fails for files in Swift storage
Previous versions of IBM Big Replicate could not perform a consistency repair to content that was stored in an OpenStack Swift zone. This issue is resolved in IBM Big Replicate 2.11.
Renamed directory with incomplete file will never receive these files
In some circumstances for previous versions of Big Replicate modification of the metadata for a parent directory within a replicated location can prevent the completion of content transfer that is underway for files underneath that directory. Big Replicate’s metadata consistency is unaffected, but file content may not be available in full.
This issue is resolved in Big Replicate 2.11.
2.4.6. New Platform Support
IBM Big Replicate has added support for the following new platforms since Big Replicate 2.10:
-
ASF Apache Hadoop 2.5.0 - 2.7.0
-
CDH 5.12
-
HDP 2.6.2
Additionally the Pivotal Hadoop Distribution is no longer a supported platform.
2.4.7. Available Packages
This release of IBM’s Big Replicate supports the following versions of Hadoop:
-
ASF Apache Hadoop 2.5.0 - 2.7.0
-
CDH 5.4.0 - CDH 5.12.0
-
HDP 2.1.0 - HDP 2.6.2
-
MapR 5.0.0 - MapR 5.2.0
-
IOP (IBM BigInsights) 4.0 - 4.2.5
The trial download includes the installation packages for CDH and HDP distributions only.
2.4.8. System Requirements
Before installing, ensure that your systems, software, and hardware meet the requirements found in our online user guide at Please contact IBM support for help with this process, and find detailed Checklist which contains updates and more information.
Certified Third-Party Components
IBM certifies the interoperability of Big Replicate with a wide variety of systems, including Hadoop distributions, object storage platforms, cloud environments, and applications.
-
Amazon S3
-
Amazon EMR 4.0 - 5.4
-
Ambari 1.6, 1.7, 2.0, 2.1
-
CDH 4.4, 5.2 - 5.12
-
EMC Isilon 7.2, 8.0
-
Google Cloud Storage
-
Google Cloud Dataproc
-
HDP 2.1.0 - 2.6.2
-
IBM BI 2.1.2 - 4.2.5
-
MapR M4.0.1 - M5.2.0
-
Microsoft Azure Blob Storage
-
Microsoft Azure HDInsights 3.2 - 3.6
-
MySQL, PostgreSQL (Hive Metastore)
-
Oracle BDA
Client Applications Supported
IBM Big Replicate is architected for maximum compatibility and interoperability with applications that use standard Hadoop File System APIs. All applications that use the standard Hadoop Distributed File System API or any Hadoop-Compatible File System API should be interoperable with IBM Big Replicate, and will be treated as supported applications. Additionally, Big Replicate supports the replication of content with Amazon S3 and S3-compatible objects stores, locally-mounted file systems, and NetApp NFS devices, but does not require or provide application compatibility libraries for these storage services.
2.4.9. Known Issues
Big Replicate 2.11 includes a small set of known issues with workarounds. In each case, resolution of the known issues is underway.
-
Big Replicate does not support truncate command - WD-FUS-3022
The public boolean truncate(Path f, long newLength)
operation in
org.apache.hadoop.fs.FileSystem
(> 2.7.0) is not yet supported. Files
will be truncated only in the cluster where the operation is initiated.
Consistency check and repair can be used to both detect and resolve any
resulting inconsistencies.
-
Recursive parent directory creation with exclusions - WD-FUS-4847
When an exclusion rule prevents the replication of specific files, applications
that perform a mkdir()
operation than includes the creation of parent
directories will not create those parent directories. This may be an unexpected
outcome from the definition of that exclusion rule.
-
SetPermissionRequest should not retry on FileNotFoundException - WD-FUS-4846
Operations that attempt to set a permission on a non-existent file may be retried unnecessarily.
2.4.10. Other Improvements
In addition to the highlighted features listed above, Big Replicate 2.11 includes a wide set of improvements in performance, functionality, scale, interoperability and general operation.
-
Create multiple replication rules at once -
WD-FUI-4443
-
Display current, failed and pending rules in one table -
WD-FUI-4470
-
FIX - Zone shown twice in UI after induction -
WD-FUI-4657
-
Remove/move detected IPs messages at the top of UI installer Server step -
WD-FUI-4681
-
Change json REST API output (do not encode URIs) -
WD-FUI-4741
-
Support multiple replication rule type in UI -
WD-FUI-4859
-
FIX - CC button for non-writer does nothing -
WD-FUI-4866
-
Log whenever an email alert is sent -
WD-FUI-4893
-
FIX - JAVA_HOME unused at console install -
WD-FUI-4894
-
PID file for Big Replicate UI -
WD-FUI-4906
-
Automatically strip protocol prefixes (and trailing paths) from domain inputs -
WD-FUI-5078
-
FIX - TypeError on replicated folder creation -
WD-FUI-5197
-
FIX - Typo: "relevant" -
WD-FUI-5220
-
FIX - Provide zone name in title -
WD-FUI-5509
-
FIX - MapR install fails to write core-site.xml -
WD-FUI-5388
-
FIX - Installer redirects to IP no host -
WD-FUI-5329
-
FIX - UI client sends wrong path name during replicated path deletion -
WD-FUI-5250
-
FIX - Consume core API to show default exclusions -
WD-FUI-5246
-
FIX - Root replication directory is marked consistent when only subdir is checked -
WD-FUS-3145
-
FIX - Repair fails to repair the files to Swift zone -
WD-FUS-3642
-
FIX - We should throw an error if a snapdiff does not exist -
WD-FUS-3645
-
Improve speed for listing written keys -
WD-FUS-3677
-
Add storage type as a Zone property -
WD-FUS-3708
-
FIX - mv 10000 files from non-replicated directory to replicated directory fails -
WD-FUS-3809
-
FIX - CC hanging when there is 10,000+ files on Cleversafe -
WD-FUS-3816
-
FIX - Talkback not able to customize setting for TALKBACKNAME, FUSION_MARKER variables -
WD-FUS-3866
-
HDP fusion-client RPM should remove symlinks for Oozie server when the RPM is uninstalled -
WD-FUS-3872
-
Better diagnostics for IHC SSL configuration problems -
WD-FUS-3894
-
Repair task improvements -
WD-FUS-3897
-
FIX - Can’t remove replication rule if created with special characters -
WD-FUS-3975
-
C118641: File size on target zone is larger than on source zone after replication finished on HDFS-S3 -
WD-FUS-3979
-
Add output of
hdfs dfs -count
to talkback -WD-FUS-4004
-
Schedule consistency check at specific time(s) -
WD-FUS-4036
-
Repair API call defaults are most aggressive -
WD-FUS-4037
-
Distribute CC results across nodes -
WD-FUS-4076
-
Add support for CDH 5.12 -
WD-FUS-4078
-
FIX - fusion-server doesn’t like MB suffix for
swift.segmentSize
-WD-FUS-4092
-
Return appropriate response for the
/fs/repair
endpoint if given task is not a repair -WD-FUS-4105
-
FIX -
RepairResource
can associate the wrongRepairDetails
with a task -WD-FUS-4106
-
FIX - Rename of non-repl to repl for files with 0 bytes -
WD-FUS-4109
-
FIX - IHC throws
UnsupportedOperationException
when initializing S3Plugin -WD-FUS-4117
-
FIX - Higher than expected heap with G1 GC -
WD-FUS-4122
-
FIX - Big Replicate parcel breaks CDH client config updates -
WD-FUS-4133
-
FIX - Big Replicate gsn directory
FileNotFoundException
uponPeriodicWriterProposal
-WD-FUS-4138
-
FIX - Move from non-replicated folder to replicated one produces inconsistent results -
WD-FUS-4146
-
DOC - Disable and remove DES, 3DES, and RC4 ciphers -
WD-FUS-4154
-
FIX - NetApp: setowner throws NPE can cause fusion to lock up -
WD-FUS-4155
-
FIX - Parcel hotfix for earlier versions of cloudera -
WD-FUS-4157
-
FIX - RPM upgrade is looking for
htrace-core4.jar
which should behtrace-core.jar
-WD-FUS-4168
-
FIX - Swift consistency check doesn’t notice sub-folders -
WD-FUS-4182
-
Provide IHC API for ongoing transfers -
WD-FUS-4202
-
[Talkback] Store Temporary Files within TMPDIR -
WD-FUS-4203
-
Make NPE nonretriable -
WD-FUS-4222
-
FIX -
cloudera-scm-agent
port is in-use because of ssh tunneling script -WD-FUS-4225
-
Add new learners to an existing zone -
WD-FUS-4250
-
Provide a mechanism to clean completed, non-contiguous GSN ranges -
WD-FUS-4279
-
FIX - Talkback Doesn’t Correctly Grab Hive Configurations -
WD-FUS-4296
-
UnsupportedOperationException
should be non-retriable -WD-FUS-4317
-
FIX - CC check state set twice on triggering node -
WD-FUS-4331
-
FIX - Solr symlinks must be careful to only reference activated fusion parcel -
WD-FUS-4341
-
Expose default replication exclusions via the rest API -
WD-FUS-4407
-
[fusion-utility-script] per-DSM manual bypass script -
WD-FUS-4417
-
FIX - ACLs displaying as inconsistent due to reported order (on Sentry managed paths) -
WD-FUS-4453
-
FIX - null Big Replicate authority causes us to use string, 'null', as authority -
WD-FUS-4489
-
FIX - FusionUriUtils#normalize breaks with null scheme -
WD-FUS-4503
-
FIX - Talkback should use
-p
switch tonetstat
-WD-FUS-4523
-
FIX - Client heap usage on large put -
WD-FUS-4767
-
FIX - Non writer knowledge of rename completion -
WD-FUS-4723
-
FIX - Rename file with space failed for LocalFS -
WD-FUS-4706
-
FIX - EMR sending 0 length request -
WD-FUS-4700
-
FIX - S3 Throttle Retries -
WD-FUS-4684
-
FIX - Some
*.RENAME
files not removed -WD-FUS-4682
-
FIX - CDH parcel alternatives incorrectly sharing dictionary keys -
WD-FUS-4656
-
Expose additional repair parameters -
WD-FUS-4654
-
Support for ListStatusIterator() in FileSystem API -
WD-FUS-4541
-
FIX - null Big Replicate authority -
WD-FUS-4538