Logstash persistent queue

Configuring the Logstash persistent queue feature.

The Logstash persistent queue provides enhanced reliability in the data pipeline. The ingestion pod buffers incoming API event data to persistent storage so that no events are lost if the downstream local storage pods or offload targets are temporarily unavailable.

For example, when you restart the analytics storage pods they take a few seconds to be able to receive API event data. While the storage pods are restarting, incoming API events are held in the persistent queue until the storage pods are available again. If the storage pods are unavailable and persistent queue is not enabled, then new API event data is lost during the storage pod outage.

If you find you are missing API event records, it is possible that the default persistent queue size is insufficient. This topic provides the steps to increase your persistent queue size.

Note: For OpenShift® users: The example steps in this topic use the Kubernetes kubectl command. On OpenShift, use the equivalent oc command in its place. If you are using a top-level CR you must edit the APIConnectCluster CR (the top-level CR), instead of directly editing the subsystem CRs. If the subsystem section is not included in the top-level CR, copy and paste the section from the subsystem CR to the APIConnectCluster CR.

Customizing the persistent queue size

By default the size of the persistent queue is 8 Gi. If you want to change this size then follow these steps according to your platform:
Note: The configuration of the size of the persistent queues is supported only on API Connect 10.0.8.1 and later.
  1. Calculate the storage space required based on the queue size you want to set.
    Note: Disabling local storage and having no offload configured would require 1 * queue size + 5 Gi, but this configuration makes no sense as it means no analytics data is recorded anywhere.
  2. Check if the required storage exceeds the current value of ingestion.queue.volumeClaimTemplate.volumeSize. By default this is 50 Gi, so if you are increasing the queue size above 11 Gi, then you must increase storage.
    On Kubernetes and OpenShift you can check this by looking at the analytics CR:
    kubectl -n <namespace> edit a7s

    On VMware, check in your analytics-extra-values.yaml file. If this file does not exist, or does not contain ingestion.queue.volumeClaimTemplate.volumeSize, then it is still set to the 50 Gi default.

  3. If you need more storage than you have defined in ingestion.queue.volumeClaimTemplate.volumeSize, then follow these steps:
    Note: Increasing the ingestion storage size causes analytics ingestion downtime. No incoming API event data is stored or offloaded while you complete these steps.
    1. VMware: Login to one of you analytics VMs and switch to root user:
      ssh apicadm@<analytics VM>
      sudo -i
    2. Scale down the ingestion pods to 0 replicas.
      Identify the name of the ingestion StatefulSet:
      kubectl get sts | grep ingestion
      Set ingestion replicas to zero:
      kubectl scale sts <ingestion statefulset> --replicas=0
    3. Delete the ingestion pod PVCs.
      Identify the ingestion pod PVCs:
      kubectl get pvc | grep ingestion
      Delete all ingestion PVCs:
      kubectl delete pvc <name>-analytics-ingestion-<integer>
    4. Update the analytics CR with the new defaultQueueSize and volumeSize:
          queue:
            type: persisted
            defaultQueueSize: <queue size>Gi
            volumeClaimTemplate:
              ...
              volumeSize: <volumeSize>Gi
    5. Scale the ingestion pods back up:
      kubectl scale sts <ingestion statefulset> --replicas=<original number of replicas>
    6. Verify that ingestion pod and PVC are recreated:
      kubectl get pods | grep ingestion
      kubectl get pvc | grep ingestion
    7. VMware: Exit from the VM and update analytics-extra-values.yaml to contain your new settings:
      ingestion:
        queue:
           type: persisted
           defaultQueueSize: <queue size>Gi
           volumeClaimTemplate:
              storageClassName: local-storage
              volumeSize: <volumeSize>Gi

      You do not need to apply this change with apicup because you already made the update within the VM. The update of analytics-extra-values.yaml is to ensure that the next time apicup subsys install is run, it does not overwrite your ingestion configuration with old settings.

  4. If you have sufficient storage defined in ingestion.queue.volumeClaimTemplate.volumeSize, then follow these steps:
    • Kubernetes, Openshift, and Cloud Pak for Integration: Edit your analytics CR:
      kubectl -n <namespace> edit a7s
      Add the spec.ingestion.queue.defaultQueueSize property to your analytics CR, and set the volumeClaimTemplate.volumeSize appropriately. For example:
          queue:
            type: persisted
            defaultQueueSize: <queue size>Gi
            volumeClaimTemplate:
              ...
              volumeSize: <volumeSize>Gi
    • VMware: Configure the persistent queue size in the analytics-extra-values.yaml file.
      1. From your project directory, open the analytics-extra-values.yaml file for editing. If this file does not exist, then create it: Analytics extra-values file.
      2. Add the following text to the file, inside the spec object:
        ingestion:
          queue:
             type: persisted
             defaultQueueSize: <queue size>Gi
             volumeClaimTemplate:
                storageClassName: local-storage
                volumeSize: <volumeSize>Gi
      3. Ensure that your extra-values-file property is set to point to your analytics-extra-values.yaml file:
        apicup subsys set <analytics subsystem> extra-values-file analytics-extra-values.yaml
      4. Apply the analytics-extra-values.yaml to your analytics subsystem:
        apicup subsys install <analytics subsystem>

Important points:

  1. defaultQueueSize is optional. If not specified it defaults to 8 Gi.
  2. defaultQueueSize must be specified in Gi, and must be an integer.
  3. The minimum value for defaultQueueSize 8 Gi.
  4. The maximum value for defaultQueueSize 100 Gi.
  5. The ingestion.queue.volumeClaimTemplate.volumeSize must be at least 30 Gi.