Horizontal scaling of storage pods

Scale your storage pod replicas for increased API event data storage and throughput.

To take advantage of storage pod scaling, you must have a three replica deployment, and be using dedicated storage, see Dedicated storage and scaling up.

Note:
  • The term replica when used with data storage pod refers to the Kubernetes storage pod, not OpenSearch replicas. OpenSearch replica settings are not configurable in API Connect.
  • The OpenSearch term node refers to OpenSearch data storage nodes. OpenSearch nodes have a one-to-one relationship with the API Connect storage pod replicas, and have the same name.

This operation requires the use of the Analytics CLI.

Increase number of storage pod replicas

  1. Edit your analytics CR:
    kubectl -n <namespace> edit a7s
  2. Add the following to the spec: section:
      template:
      - name: storage
        replicaCount: <new replica count>
    where <new replica count> must be less than or equal to the number of worker nodes you have available.
  3. Wait for the new storage pod replicas to start on your new worker nodes, and for the analytics cluster to rebalance your API event data to use your new replicas. The time the rebalancing takes depends on the size of your API event data and the network transmission speed between worker nodes. You can check the rebalancing status with the CLI health command:
    apic --mode analytics clustermgmt:getHealth  --server <platform api endpoint> --analytics-service <analytics service name> --format json
    where:
    • <platform api endpoint> is the platform API endpoint.
    • <analytics service name> is the name of the analytics service as configured in the Cloud Manager UI.
    When rebalancing is complete, the number_of_data_nodes matches your updated replicaCount, and the highlighted _shards properties are all zero, as shown:
    {
       "cluster_name":"apic-analytics-cluster",
       "status":"green",
       "timed_out":false,
       "number_of_nodes":8,
       "number_of_data_nodes":5,
       "discovered_master":true,
       "discovered_cluster_manager":true,
       "active_primary_shards":8,
       "active_shards":24,
       "relocating_shards":0,
       "initializing_shards":0,
       "unassigned_shards":0,
       "delayed_unassigned_shards":0,
       "number_of_pending_tasks":0,
       "number_of_in_flight_fetch":0,
       "task_max_waiting_in_queue_millis":0,
       "active_shards_percent_as_number":100
    }
    
Note: If in the past you increased and then reduced the number of data storage replicas in your deployment, you might have an OpenSearch exclude list that contains the names of the previous replicas you removed. New storage pod replicas cannot be created if their name is in the OpenSearch exclude list. To empty the OpenSearch exclude list, follow the same steps that you used to create it: Create exclude list, but set persistent.cluster.routing.allocation.exclude._name to an empty string.

Reduce number of storage pod replicas

  1. Verify that your analytics cluster health is green. Use the analytics CLI health command:
    apic --mode analytics clustermgmt:getHealth  --server <platform api endpoint> --analytics-service <analytics service name> --format json
    where:
    • <platform api endpoint> is the platform API endpoint.
    • <analytics service name> is the name of the analytics service as configured in the Cloud Manager UI.
    Check that the status property is green:
    {
       "cluster_name":"apic-analytics-cluster",
       "status":"green",
       "timed_out":false,
       "number_of_nodes":8,
       "number_of_data_nodes":5,
       "discovered_master":true,
       "discovered_cluster_manager":true,
       "active_primary_shards":8,
       "active_shards":24,
       "relocating_shards":0,
       "initializing_shards":0,
       "unassigned_shards":0,
       "delayed_unassigned_shards":0,
       "number_of_pending_tasks":0,
       "number_of_in_flight_fetch":0,
       "task_max_waiting_in_queue_millis":0,
       "active_shards_percent_as_number":100
    }
    
    Do not proceed if the status is not green.
  2. Replicas are removed by reducing the Kubernetes replicaCount in the analytics CR. When replicaCount is reduced, the replicas that are deleted are always the highest numbered replicas. For example:
    kubectl get pods | grep storage
    
    NAME                                  READY   STATUS      RESTARTS   AGE
    analytics-storage-0                   1/1     Running     0          140d
    analytics-storage-1                   1/1     Running     0          140d
    analytics-storage-2                   1/1     Running     0          140d
    analytics-storage-3                   1/1     Running     0          140d
    analytics-storage-4                   1/1     Running     0          140d
    If you reduce replicaCount from 5 to 3, the two replicas that are deleted are analytics-storage-3 and analytics-storage-4.

    Before you reduce the replicaCount, you must ensure that replicas to be deleted contain no API event data. To make OpenSearch relocate all API event data to the replicas you are keeping, add the replicas that you want to delete to the OpenSearch exclude list:

    1. Export the current analytics cluster settings to a JSON file:
      apic -m analytics clustermgmt:getSettings --server <platform api endpoint> --analytics-service <analytics service name> --format json > a7s_cluster_settings.json

      This command creates a file called a7s_cluster_settings.json in your current directory.

    2. Edit the a7s_cluster_settings.json file, and set the persistent.cluster.routing.allocation.exclude._name property to a comma-separated list of replicas you plan to remove. For example:
      {
          "persistent": {
              "cluster": {
                  "routing": {
                      "allocation": {
                          "exclude": {
                              "_name": "analytics-storage-3,analytics-storage-4"
                          }
                      }
                  }
              },
              "plugins": {
                  "index_state_management": {
                      "metadata_migration": {
                          "status": "1"
                      },
                      "template_migration": {
                          "control": "-1"
                      },
                      "history": {
                          "enabled": "false"
                      }
                  }
              }
          },
          "transient": {}
      }

      You might need to add the cluster.routing.allocation.exclude structure if this property has not been set before.

    3. Apply the updated a7s_cluster_settings.json file:
      apic -m analytics clustermgmt:putSettings --server <platform api endpoint> --analytics-service <analytics service name> a7s_cluster_settings.json --format json 
  3. Wait for your analytics cluster to remove all data from the replicas you want to delete, which you specified in step 2. You can check that the data rebalancing is complete with the clustermgmt:catAllocation CLI command:
    apic -m analytics clustermgmt:catAllocation --server <platform api endpoint> --analytics-service <analytics service name> --format json --return_format json
    Replicas that don't have data appear as entries with a shards value of zero in the output (you can ignore the disk.* properties):
    [
      {
        "shards": "5",
        "disk.indices": "113.3kb",
        "disk.used": "27.3gb",
        "disk.avail": "222.5gb",
        "disk.total": "249.8gb",
        "disk.percent": "10",
        "host": "192.168.87.226",
        "ip": "192.168.87.226",
        "node": "analytics-storage-0"
      },
      {
        "shards": "0",
        "disk.indices": "0",
        "disk.used": "36.9kb",
        "disk.avail": "212.9kb",
        "disk.total": "249.8kb",
        "disk.percent": "1",
        "host": "192.168.64.225",
        "ip": "192.168.64.225",
        "node": "analytics-storage-3"
      },
      {
        "shards": "5",
        "disk.indices": "16.8kb",
        "disk.used": "28.2gb",
        "disk.avail": "221.6gb",
        "disk.total": "249.8gb",
        "disk.percent": "11",
        "host": "192.168.112.227",
        "ip": "192.168.112.227",
        "node": "analytics-storage-1"
      },
      {
        "shards": "5",
        "disk.indices": "23.3kb",
        "disk.used": "29.3gb",
        "disk.avail": "220.4gb",
        "disk.total": "249.8gb",
        "disk.percent": "11",
        "host": "192.168.24.102",
        "ip": "192.168.24.102",
        "node": "analytics-storage-2"
      },
      {
        "shards": "0",
        "disk.indices": "0",
        "disk.used": "35.4kb",
        "disk.avail": "214.4kb",
        "disk.total": "249.8kb",
        "disk.percent": "1",
        "host": "192.168.91.149",
        "ip": "192.168.91.149",
        "node": "analytics-storage-4"
      }
    ]
    The time rebalancing takes depends on the size of your API event data and the network transmission speed between pod replicas.
    Note: The value in the node property of the clustermgmt:catAllocation output is the same as the names of the Kubernetes storage pod replicas.
  4. Verify that your analytics cluster health is green. Use the analytics CLI health command:
    apic --mode analytics clustermgmt:getHealth  --server <platform api endpoint> --analytics-service <analytics service name> --format json
    where:
    • <platform api endpoint> is the platform API endpoint.
    • <analytics service name> is the name of the analytics service as configured in the Cloud Manager UI.
    Check that the status property is green:
    {
       "cluster_name":"apic-analytics-cluster",
       "status":"green",
       "timed_out":false,
       "number_of_nodes":8,
       "number_of_data_nodes":5,
       "discovered_master":true,
       "discovered_cluster_manager":true,
       "active_primary_shards":8,
       "active_shards":24,
       "relocating_shards":0,
       "initializing_shards":0,
       "unassigned_shards":0,
       "delayed_unassigned_shards":0,
       "number_of_pending_tasks":0,
       "number_of_in_flight_fetch":0,
       "task_max_waiting_in_queue_millis":0,
       "active_shards_percent_as_number":100
    }
    
    Do not proceed if the status is not green.
  5. Edit your analytics CR:
    kubectl -n <namespace> edit a7s
  6. Update the replicaCount:
      template:
      - name: storage
        replicaCount: <new replica count>
    where <new replica count> is the reduced number of replicas that you want.
    Note: If the replicaCount property does not exist in the analytics CR, check that you are not attempting to reduce replicaCount to 2. Reducing replicaCount to less than 3 is not a supported operation.
  7. Monitor the health status of your analytics cluster with the CLI health command:
    apic --mode analytics clustermgmt:getHealth  --server <platform api endpoint> --analytics-service <analytics service name> --format json
    where:
    • <platform api endpoint> is the platform API endpoint.
    • <analytics service name> is the name of the analytics service as configured in the Cloud Manager UI.
    Confirm that the status is green, the number_of_data_nodes matches your updated replicaCount, and that the highlighted shard properties are all zero:
    {
       "cluster_name":"apic-analytics-cluster",
       "status":"green",
       "timed_out":false,
       "number_of_nodes":6,
       "number_of_data_nodes":3,
       "discovered_master":true,
       "discovered_cluster_manager":true,
       "active_primary_shards":8,
       "active_shards":24,
       "relocating_shards":0,
       "initializing_shards":0,
       "unassigned_shards":0,
       "delayed_unassigned_shards":0,
       "number_of_pending_tasks":0,
       "number_of_in_flight_fetch":0,
       "task_max_waiting_in_queue_millis":0,
       "active_shards_percent_as_number":100
    }