Additional information

Configuring replication for various network setups

For Cloud Object Storage system to replicate objects, the source-side Accesser® device must be able to establish TLS connections to the destination endpoint. The destination endpoint varies depending on the wanted replication network setup.

Direct Connections

In this setup, the source Accesser device connects directly to the destination Accesser device. This is true in the following scenarios:

  1. The Replication Endpoint is not specified for the Destination Vault
  2. Round-robin DNS is used for load balancing (for example Replication Endpoint points to a domain for which multiple A records exist in the DNS).
  3. Single Accesser device IP specified for Replication Endpoint (uncommon)

If all Accesser devices are using the default preinstalled TLS certificates (that is, internal CA issued), then no further configuration is required to make replication work because internal CA is trusted by all devices by default.

If any destination-side Accesser device is configured with an external-CA issued certificate, and that CA is not registered with the system, the CA certificate must be added to the Certificate PEM field on the destination Storage Pool. This allows the source/client-side Accesser device to trust the specified external CA when replicating objects to that pool. A common scenario where this is required is when Access Pool HTTPS certificates are configured.

Proxied Connections

The destination Access Pools may also be preceded by 1 or more proxy servers (for example load balancer). To route replication traffic through these servers, the Replication Endpoint on the destination Vault should be set accordingly and based on the target Access Pools. An Administrator should help ensure that traffic for a given Vault is not accidentally routed to an Access Pool where that Vault is not deployed, as that results in sync failures (replication log indicates “404 Not Found”).

Proxies may have different modes when it comes to TLS, mainly pass-through, and termination. The admin should identify the first server in the traffic route that terminates the TLS connection from the source Access Pool, then help ensure that its certificate CA is trusted by the Cloud Object Storage system. Using the above example, if Proxy 1 uses a certificate that is issued by an external CA, that CA’s public certificate should be added to the Certificate PEM fields on the Storage Pool where Destination Vault 1 is located.

Troubleshooting

All replication-related activity is logged on the Accesser devices at:

/var/log/dsnet-core/access.log 
/var/log/dsnet-core/replication.log 

Refer to Replication logging to see where different replication operations are logged.

Replication performs preliminary checks when the user performs PutBucketReplication API call to make sure that there is network connectivity to one or more destination buckets. Often, if there is a TLS-related issue that prevents replication, it is caught there and it prevents the user from placing a replication policy on their bucket. The user receives a 500 Internal Server Error response, and the generated access log contains additional information. Here is a sample log (only showing relevant fields):

{
...
  "request_uri": "/source-bucket-name?replication",
  "status": 500,
  "request_type": "REST.PUT.CONTAINER_REPLICATION",
  "error_message": "javax.net.ssl.SSLHandshakeException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target",
  "replication": {
    "num_failed_remote_buckets": 1,
    "failed_remotes": "destination-bucket-name",
    "num_sync_remote_buckets": 1
  },
...
}

It is also possible that the system configuration changes after the policy is configured such that it causes replications to fail (for example new certificate installed, vault deployment changes, changes to network topology, and so on). In that case, network error events will be generated on the manager console, as well as replication logs indicating failure:

{
...
  "replication": {
    "status": "internal-error",
    "message": "internal error",
    "internal_message": "Unable to execute HTTP request",
    "remote": {
      "destination_url": "https://{replication-endpoint}",
      "remote_request": {
        "num_attempts": 4,
        "error_message": "endResult=NETWORKING_ERROR ... PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target}"
      }
    },
  },
  "version_name": "0000019a-e08c-c040-bffe-7524c2f60744",
  "container_name": "F1780-bucker-001-source",
  "error": "... PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target}",
  "request_type": "S3.REPLICATION.SOURCE_OBJECT_ATTEMPT",
...
}