Operations

Operators with access to the Manager interface can view Manager Events that have been generated when Replication encounters issues during an operation. Each of these events is scoped to a given Container vault and are generated by some Replication policy on a Bucket that is housed in that Container vault.

Sync Latency Alert Event

The Sync Latency Alert is triggered when replication sync operations take longer than the currently configured Sync Latency Alerting Threshold for the container vault. The latency is measured as the time between when the source object is written (that is, when write has finished and the object is visible) and the successful completion of the replica write operation, effectively measuring the time that it takes for the system to replicate objects from source to destination.

Note: This includes the amount of time spent transferring the object to the target-bucket, which means the size of the replicated objects affects this measurement. If users commonly write large objects and the manager is generating this alert too frequently, the operator can either increase the replication rates on the Storage Pool or increase the latency threshold to a more reasonable limit based on user workloads.

This latency measurement does not include internal sync retries. In other words, they only measure the first attempt, as retries will skew the measurement and make it less useful.

When this event is produced, it indicates that the Replication queue is not being serviced as quickly as wanted. This occurs when the Replication Sync Rate-Limits for the Storage Pool are set low relative to the volume of sync operations in the container vault. The sync volume in the Storage Pool consists of both new sync operations and retries that need to be performed. This event can be remediated by either increasing the Rate-Limit, or by changing the workload to lower the volume of sync operations.

Sync Network Error Event

The Sync Network Error is triggered when replication cannot successfully complete sync operations due to a networking issue. There are several different Network issues that can arise:

  • - Bad Certificate configuration at the Storage-Pool level.
  • - Target-buckets not being accessible via the Replication Endpoint that is configured at the container vault
  • - Accesser devices not having network connectivity to the configured Endpoint.

When this event is encountered, the operator must intervene to remediate the system.

See Configuring replication for various network setups for more help.

Sync System Error Event

The Sync System Error is triggered when replication encounters internal errors that prevent sync operations from completing successfully. These cannot be directly remediated by the Cloud Object Storage Operator and requires the intervention of IBM Support.