Horizontal scaling of storage pods
Scale your storage pod replicas for increased API event data storage and throughput.
To take advantage of storage pod scaling, you must have a three replica deployment, and be using dedicated storage, see Dedicated storage and scaling up.
Note:
- The term
replica
when used withdata storage pod
refers to the Kubernetes storage pod, not OpenSearch replicas. OpenSearch replica settings are not configurable in API Connect. - The OpenSearch term
node
refers to OpenSearch data storage nodes. OpenSearch nodes have a one-to-one relationship with the API Connect storage pod replicas, and have the same name.
This operation requires the use of the Analytics CLI.
Increase number of storage pod replicas
- Edit your analytics
CR:
kubectl -n <namespace> edit a7s
- Add the following to the
spec:
section:
where <new replica count> must be less than or equal to the number of worker nodes you have available.template: - name: storage replicaCount: <new replica count>
- Wait for the new storage pod replicas to start on your new worker nodes, and for the analytics
cluster to rebalance your API event data to use your new replicas. The time the rebalancing takes
depends on the size of your API event data and the network transmission speed between worker nodes.
You can check the rebalancing status with the CLI health command:When rebalancing is complete, the
where:apic --mode analytics clustermgmt:getHealth --server <platform api endpoint> --analytics-service <analytics service name> --format json
<platform api endpoint>
is the platform API endpoint.<analytics service name>
is the name of the analytics service as configured in the Cloud Manager UI.
number_of_data_nodes
matches your updatedreplicaCount
, and the highlighted_shards
properties are all zero, as shown:{ "cluster_name":"apic-analytics-cluster", "status":"green", "timed_out":false, "number_of_nodes":8, "number_of_data_nodes":5, "discovered_master":true, "discovered_cluster_manager":true, "active_primary_shards":8, "active_shards":24, "relocating_shards":0, "initializing_shards":0, "unassigned_shards":0, "delayed_unassigned_shards":0, "number_of_pending_tasks":0, "number_of_in_flight_fetch":0, "task_max_waiting_in_queue_millis":0, "active_shards_percent_as_number":100 }
Note: If in the past you increased and then reduced the number of data storage replicas in your
deployment, you might have an OpenSearch exclude list that
contains the names of the previous replicas you removed. New storage pod replicas cannot be created
if their name is in the OpenSearch exclude list. To empty the OpenSearch exclude list, follow the
same steps that you used to create it: Create exclude list, but set
persistent.cluster.routing.allocation.exclude._name
to an empty string.Reduce number of storage pod replicas
- Verify that your analytics cluster health is green. Use the analytics CLI health command:Check that the status property is green:
where:apic --mode analytics clustermgmt:getHealth --server <platform api endpoint> --analytics-service <analytics service name> --format json
<platform api endpoint>
is the platform API endpoint.<analytics service name>
is the name of the analytics service as configured in the Cloud Manager UI.
Do not proceed if the status is not green.{ "cluster_name":"apic-analytics-cluster", "status":"green", "timed_out":false, "number_of_nodes":8, "number_of_data_nodes":5, "discovered_master":true, "discovered_cluster_manager":true, "active_primary_shards":8, "active_shards":24, "relocating_shards":0, "initializing_shards":0, "unassigned_shards":0, "delayed_unassigned_shards":0, "number_of_pending_tasks":0, "number_of_in_flight_fetch":0, "task_max_waiting_in_queue_millis":0, "active_shards_percent_as_number":100 }
- Replicas are removed by reducing the Kubernetes
replicaCount
in the analytics CR. WhenreplicaCount
is reduced, the replicas that are deleted are always the highest numbered replicas. For example:
If you reducekubectl get pods | grep storage NAME READY STATUS RESTARTS AGE analytics-storage-0 1/1 Running 0 140d analytics-storage-1 1/1 Running 0 140d analytics-storage-2 1/1 Running 0 140d analytics-storage-3 1/1 Running 0 140d analytics-storage-4 1/1 Running 0 140d
replicaCount
from 5 to 3, the two replicas that are deleted areanalytics-storage-3
andanalytics-storage-4
.Before you reduce the
replicaCount
, you must ensure that replicas to be deleted contain no API event data. To make OpenSearch relocate all API event data to the replicas you are keeping, add the replicas that you want to delete to the OpenSearch exclude list:- Export the current analytics cluster settings to a JSON
file:
apic -m analytics clustermgmt:getSettings --server <platform api endpoint> --analytics-service <analytics service name> --format json > a7s_cluster_settings.json
This command creates a file called a7s_cluster_settings.json in your current directory.
- Edit the a7s_cluster_settings.json file, and set the
persistent.cluster.routing.allocation.exclude._name
property to a comma-separated list of replicas you plan to remove. For example:{ "persistent": { "cluster": { "routing": { "allocation": { "exclude": { "_name": "analytics-storage-3,analytics-storage-4" } } } }, "plugins": { "index_state_management": { "metadata_migration": { "status": "1" }, "template_migration": { "control": "-1" }, "history": { "enabled": "false" } } } }, "transient": {} }
You might need to add the
cluster.routing.allocation.exclude
structure if this property has not been set before. - Apply the updated a7s_cluster_settings.json
file:
apic -m analytics clustermgmt:putSettings --server <platform api endpoint> --analytics-service <analytics service name> a7s_cluster_settings.json --format json
- Export the current analytics cluster settings to a JSON
file:
- Wait for your analytics cluster to remove all data from the replicas you want to delete, which
you specified in step 2. You can check
that the data rebalancing is complete with the
clustermgmt:catAllocation
CLI command:
Replicas that don't have data appear as entries with aapic -m analytics clustermgmt:catAllocation --server <platform api endpoint> --analytics-service <analytics service name> --format json --return_format json
shards
value of zero in the output (you can ignore thedisk.*
properties):
The time rebalancing takes depends on the size of your API event data and the network transmission speed between pod replicas.[ { "shards": "5", "disk.indices": "113.3kb", "disk.used": "27.3gb", "disk.avail": "222.5gb", "disk.total": "249.8gb", "disk.percent": "10", "host": "192.168.87.226", "ip": "192.168.87.226", "node": "analytics-storage-0" }, { "shards": "0", "disk.indices": "0", "disk.used": "36.9kb", "disk.avail": "212.9kb", "disk.total": "249.8kb", "disk.percent": "1", "host": "192.168.64.225", "ip": "192.168.64.225", "node": "analytics-storage-3" }, { "shards": "5", "disk.indices": "16.8kb", "disk.used": "28.2gb", "disk.avail": "221.6gb", "disk.total": "249.8gb", "disk.percent": "11", "host": "192.168.112.227", "ip": "192.168.112.227", "node": "analytics-storage-1" }, { "shards": "5", "disk.indices": "23.3kb", "disk.used": "29.3gb", "disk.avail": "220.4gb", "disk.total": "249.8gb", "disk.percent": "11", "host": "192.168.24.102", "ip": "192.168.24.102", "node": "analytics-storage-2" }, { "shards": "0", "disk.indices": "0", "disk.used": "35.4kb", "disk.avail": "214.4kb", "disk.total": "249.8kb", "disk.percent": "1", "host": "192.168.91.149", "ip": "192.168.91.149", "node": "analytics-storage-4" } ]
Note: The value in thenode
property of theclustermgmt:catAllocation
output is the same as the names of the Kubernetes storage pod replicas. - Verify that your analytics cluster health is green. Use the analytics CLI health command:Check that the status property is green:
where:apic --mode analytics clustermgmt:getHealth --server <platform api endpoint> --analytics-service <analytics service name> --format json
<platform api endpoint>
is the platform API endpoint.<analytics service name>
is the name of the analytics service as configured in the Cloud Manager UI.
Do not proceed if the status is not green.{ "cluster_name":"apic-analytics-cluster", "status":"green", "timed_out":false, "number_of_nodes":8, "number_of_data_nodes":5, "discovered_master":true, "discovered_cluster_manager":true, "active_primary_shards":8, "active_shards":24, "relocating_shards":0, "initializing_shards":0, "unassigned_shards":0, "delayed_unassigned_shards":0, "number_of_pending_tasks":0, "number_of_in_flight_fetch":0, "task_max_waiting_in_queue_millis":0, "active_shards_percent_as_number":100 }
- Edit your analytics
CR:
kubectl -n <namespace> edit a7s
- Update the
replicaCount
:
where <new replica count> is the reduced number of replicas that you want.template: - name: storage replicaCount: <new replica count>
Note: If thereplicaCount
property does not exist in the analytics CR, check that you are not attempting to reducereplicaCount
to 2. ReducingreplicaCount
to less than 3 is not a supported operation. - Monitor the health status of your analytics cluster with the CLI health command:Confirm that the
where:apic --mode analytics clustermgmt:getHealth --server <platform api endpoint> --analytics-service <analytics service name> --format json
<platform api endpoint>
is the platform API endpoint.<analytics service name>
is the name of the analytics service as configured in the Cloud Manager UI.
status
isgreen
, thenumber_of_data_nodes
matches your updatedreplicaCount
, and that the highlighted shard properties are all zero:{ "cluster_name":"apic-analytics-cluster", "status":"green", "timed_out":false, "number_of_nodes":6, "number_of_data_nodes":3, "discovered_master":true, "discovered_cluster_manager":true, "active_primary_shards":8, "active_shards":24, "relocating_shards":0, "initializing_shards":0, "unassigned_shards":0, "delayed_unassigned_shards":0, "number_of_pending_tasks":0, "number_of_in_flight_fetch":0, "task_max_waiting_in_queue_millis":0, "active_shards_percent_as_number":100 }