Specifying batch size thresholds for a subscription applying to Hadoop using Web HDFS

Batch size thresholds balances the need to get data applied to the target quickly against the need to minimize resource utilization. You can set the batch size threshold property to apply data less frequently and process larger amounts of data. The threshold values are applicable to both refresh and mirror operations.

Procedure

  1. Click Configuration > Subscriptions.
  2. Right-click on a subscription and select Hadoop Properties.
  3. Enter the number of rows that can be changed before subscription data is hardened to file.

    The default value is 100000.

  4. Enter the amount of time that will elapse before subscription data is hardened to file. The default value is 600.

    The values that you specify are used by CDC Replication to determine when a flat file is complete and is made available to Hadoop for processing.

  5. Click OK.