Configuring local storage

If you are using local storage, you must first create a storage class and then setup the persistent volumes.

You must create at least six persistent volumes when you are using local-storage, and at least one storage class. Preferably, create at least six storage classes, one for each of the statefulsets for these persistent volumes to use. The statefuls sets are: Cassandra, CouchDB, Datalayer, Zookeeper, Kafka, and Elasticsearch.

Before you begin

Review the storage guidance in the Choosing your storage solution for Monitoring.
Review the procedure for Configuring drives for local storage.
Review the procedure for Setting readahead on a drive or volume for local storage.

Procedure

Complete the following steps to set up local storage:

  1. Create a storage class for each statefulset by running the following bash script:

     #!/bin/bash
     classes=('monitoring-cassandra' 'monitoring-couchdb' 'monitoring-datalayer' 'monitoring-elasticsearch' 'monitoring-kafka' 'monitoring-zookeeper')
     template='{"apiVersion":"storage.k8s.io/v1","kind":"StorageClass","metadata":{"name":"<SC_NAME>","labels":{"release":"monitoring"}},"provisioner":"kubernetes.io/no-provisioner","reclaimPolicy":"Delete","volumeBindingMode":"WaitForFirstConsumer"}'
     for c in "${classes[@]}"; do
     echo "$template" | sed "s/<SC_NAME>/$c/" | oc apply -f -
     done
    
  2. Run the get storage class command again to check that the storage class was created successfully: oc get sc

    This example output shows that six storage classes were created - one for each statefulset:

     NAME                                  PROVISIONER                     AGE
     monitoring-cassandra                  kubernetes.io/no-provisioner    84s
     monitoring-couchdb                    kubernetes.io/no-provisioner    83s
     monitoring-datalayer                  kubernetes.io/no-provisioner    82s
     monitoring-elasticsearch              kubernetes.io/no-provisioner    81s
     monitoring-kafka                      kubernetes.io/no-provisioner    80s
     monitoring-zookeeper                  kubernetes.io/no-provisioner    2m38s
    
  3. Determine where each statefulset pod will reside. In a standard configuration, each statefulset has one pod.

    In a high availability configuration, each statefulset has three pods.

    Note: Cassandra has high RAM requirements (6 GB in a size0 environment, 16 GB in a size1 environment). High availability environments require a size1 environment.
    On standard OpenShift clusters, only worker nodes can be used.
    Run the following command to see which nodes are available: oc get node

  4. Modify and run the bash script to complete the following actions:

    Note: For non-HA environments, a single node is sufficient. If you are using network attached drives, then make sure to mount them to the created folders (or modify the script to point at the drives).

    #!/bin/bash
    username=core # Standard username on openshift environments
    base_dir='/var/home/core/local-storage' # Location where persistent volume data will be stored
    max_size='500Gi' # Maximum size for any persistent volume (only affects which persistent volume claims are bound, not actual disk usage).
    
    cassandra_nodes=('worker6.magnolia3.os.fyre.ibm.com' 'worker7.magnolia3.os.fyre.ibm.com' 'worker8.magnolia3.os.fyre.ibm.com')
    couchdb_nodes=('worker0.magnolia3.os.fyre.ibm.com' 'worker1.magnolia3.os.fyre.ibm.com' 'worker2.magnolia3.os.fyre.ibm.com')
    datalayer_nodes=('worker0.magnolia3.os.fyre.ibm.com' 'worker1.magnolia3.os.fyre.ibm.com' 'worker2.magnolia3.os.fyre.ibm.com')
    elasticsearch_nodes=('worker0.magnolia3.os.fyre.ibm.com' 'worker1.magnolia3.os.fyre.ibm.com' 'worker2.magnolia3.os.fyre.ibm.com')
    kafka_nodes=('worker3.magnolia3.os.fyre.ibm.com' 'worker4.magnolia3.os.fyre.ibm.com' 'worker5.magnolia3.os.fyre.ibm.com')
    zookeeper_nodes=('worker3.magnolia3.os.fyre.ibm.com' 'worker4.magnolia3.os.fyre.ibm.com' 'worker5.magnolia3.os.fyre.ibm.com')
    
    pv_template='{"apiVersion":"v1","kind":"PersistentVolume","metadata":{"name":"<NAME>","labels":{"release":"monitoring"}},"spec":{"accessModes":["ReadWriteOnce"],"capacity":{"storage":"<MAX_SIZE>"},"local":{"path":"<PATH>"},"nodeAffinity":{"required":{"nodeSelectorTerms":[{"matchExpressions":[{"key":"kubernetes.io/hostname","operator":"In","values":["<NODE>"]}]}]}},"persistentVolumeReclaimPolicy":"Retain","storageClassName":"<SC_NAME>","volumeMode":"Filesystem"}}'
    
    function setup_storage() {
    sc=$1
    counter=0
    for node in "${nodes[@]}"; do
    ssh ${username}@${node} mkdir -p "${base_dir}/${sc}-${counter}"
    echo "$pv_template" | sed "s/<NAME>/monitoring-${sc}-${counter}/" | sed "s/<MAX_SIZE>/${max_size}/" | sed "s|<PATH>|${base_dir}/${sc}-${counter}|" | sed "s/<NODE>/$node/" | sed "s/<SC_NAME>/monitoring-${sc}/" | oc apply -f -
    counter=$((counter+1))
    done
    }
    
    nodes=("${cassandra_nodes[@]}")
    setup_storage cassandra
    
    nodes=("${couchdb_nodes[@]}")
    setup_storage couchdb
    
    nodes=("${datalayer_nodes[@]}")
    setup_storage datalayer
    
    nodes=("${elasticsearch_nodes[@]}")
    setup_storage elasticsearch
    
    nodes=("${kafka_nodes[@]}")
    setup_storage kafka
    
    nodes=("${zookeeper_nodes[@]}")
    setup_storage zookeeper
    
  5. Verify that the persistent volumes were created by running: oc get pv
    For example,

     NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS      CLAIM                    STORAGECLASS                REASON   AGE
     monitoring-cassandra-0                     500Gi      RWO            Retain           Available                            monitoring-cassandra                 1m
     monitoring-cassandra-1                     500Gi      RWO            Retain           Available                            monitoring-cassandra                 1m
     monitoring-cassandra-2                     500Gi      RWO            Retain           Available                            monitoring-cassandra                 1m
     monitoring-couchdb-0                       500Gi      RWO            Retain           Available                            monitoring-couchdb                   1m
     monitoring-couchdb-1                       500Gi      RWO            Retain           Available                            monitoring-couchdb                   1m
     monitoring-couchdb-2                       500Gi      RWO            Retain           Available                            monitoring-couchdb                   1m
     monitoring-datalayer-0                     500Gi      RWO            Retain           Available                            monitoring-datalayer                 1m
     monitoring-datalayer-1                     500Gi      RWO            Retain           Available                            monitoring-datalayer                 1m
     monitoring-datalayer-2                     500Gi      RWO            Retain           Available                            monitoring-datalayer                 1m
     monitoring-elasticsearch-0                 500Gi      RWO            Retain           Available                            monitoring-elasticsearch             1m
     monitoring-elasticsearch-1                 500Gi      RWO            Retain           Available                            monitoring-elasticsearch             1m
     monitoring-elasticsearch-2                 500Gi      RWO            Retain           Available                            monitoring-elasticsearch             1m
     monitoring-kafka-0                         500Gi      RWO            Retain           Available                            monitoring-kafka                     1m
     monitoring-kafka-1                         500Gi      RWO            Retain           Available                            monitoring-kafka                     1m
     monitoring-kafka-2                         500Gi      RWO            Retain           Available                            monitoring-kafka                     1m
     monitoring-zookeeper-0                     500Gi      RWO            Retain           Available                            monitoring-zookeeper                 1m
     monitoring-zookeeper-1                     500Gi      RWO            Retain           Available                            monitoring-zookeeper                 1m
     monitoring-zookeeper-2                     500Gi      RWO            Retain           Available
    

    For more information about Persistentvolumeclaims for statefulsets and binding, see Persistentvolumeclaims for statefulsets and binding.

Cassandra example

The following is an example of a custom storage class and persistentvolume for Cassandra. In this example, the user decided that they want to use local-storage-cassandra as the value for the monitoringDeploy.global.persistence.storageClassOption.cassandradata parameter, and 500Gi as the value for the monitoringDeploy.global.persistence.storageSize.cassandradata parameter.

The Cassandra creates a persistentvolumeclaim that successfully binds to this persistentvolume. The same configuration can be applied to the other statefulsets. That is, if the values that are provided for these parameters in the Monitoring custom resource correspond to the persistentvolumes that are created, then, the persistentvolumeclaims that they create can successfully bind to them.

StorageClass:

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: local-storage-cassandra
  labels:
    release: ibmcloudappmgmt
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: WaitForFirstConsumer

PersistentVolume:

apiVersion: v1
kind: PersistentVolume
metadata:
  name: ibmcloudappmgmt-cassandra0
  labels:
    release: ibmcloudappmgmt
spec:
  capacity:
    storage: 500Gi
  storageClassName: local-storage-cassandra
  local:
    path: /data/k8s/cassandra0
  nodeAffinity:
      required:
        nodeSelectorTerms:
        - matchExpressions:
          - key: kubernetes.io/hostname
            operator: In
            values: ["10.10.10.1"]
  accessModes:
  - ReadWriteOnce
  persistentVolumeReclaimPolicy: Retain

Persistentvolumeclaims for statefulsets and binding

When Monitoring is installed, it creates persistentvolumeclaims for each of statefulsets. Each of these persistentvolumeclaims needs a persistentvolume to bind to. IBM Cloud Pak for Multicloud Management defaults to using one replica for each statefulset, and therefore, one persistentvolumeclaim for each statefulset. If the number of replicas for any of the statefulsets is increased, more persistentvolumeclaims per statefulset are created, and more persistentvolumes must be created for the persistentvolumeclaims to bind to.

Each of the persistentvolumeclaims searches for a persistentvolume matching its requirements to bind to. The following example provides a persistentvolumeclaim, and the persistentvolume created for it to bind to.
For more detail about statefulsets and planning a high availability Monitoring environment, see Planning for a high availability installation.

Example: A persistentvolumeclaim and a persistentvolume

The following persistentvolumeclaim was created when Monitoring was installed. The keys to focus on are:

The values for these keys are the keys that the persistentvolume needs to reference to ensure that it gets bound to the persistentvolumeclaim.

[root@cp4mcm-installer-chicharron-inf ~]# oc get pvc data-ibmcloudappmgmt-cassandra-0 -o json
{
    "apiVersion": "v1",
    "kind": "PersistentVolumeClaim",
    "metadata": {
        "annotations": {
            "pv.kubernetes.io/bind-completed": "yes",
            "pv.kubernetes.io/bound-by-controller": "yes",
            "volume.beta.kubernetes.io/storage-provisioner": "rook-ceph.rbd.csi.ceph.com"
        },
        "creationTimestamp": "2020-03-13T21:20:27Z",
        "finalizers": [
            "kubernetes.io/pvc-protection"
        ],
        "labels": {
            "app": "cassandra",
            "chart": "cassandra",
            "heritage": "Tiller",
            "release": "ibmcloudappmgmt"
        },
        "name": "data-ibmcloudappmgmt-cassandra-0",
        "namespace": "kube-system",
        "resourceVersion": "1911203",
        "selfLink": "/api/v1/namespaces/kube-system/persistentvolumeclaims/data-ibmcloudappmgmt-cassandra-0",
        "uid": "e34209f8-dd78-4c34-956c-651a5eb7adaa"
    },
    "spec": {
        "accessModes": [
            "ReadWriteOnce"
        ],
        "resources": {
            "requests": {
                "storage": "50Gi"
            }
        },
        "storageClassName": "rook-ceph-block-internal",
        "volumeMode": "Filesystem",
        "volumeName": "pvc-e34209f8-dd78-4c34-956c-651a5eb7adaa"
    },
    "status": {
        "accessModes": [
            "ReadWriteOnce"
        ],
        "capacity": {
            "storage": "50Gi"
        },
        "phase": "Bound"
    }
}

This is the persistentvolume that the persistentvolumeclaim is bound to:

{
    "apiVersion": "v1",
    "kind": "PersistentVolume",
    "metadata": {
        "annotations": {
            "pv.kubernetes.io/provisioned-by": "rook-ceph.rbd.csi.ceph.com"
        },
        "creationTimestamp": "2020-03-13T21:20:29Z",
        "finalizers": [
            "kubernetes.io/pv-protection"
        ],
        "name": "pvc-e34209f8-dd78-4c34-956c-651a5eb7adaa",
        "resourceVersion": "1911201",
        "selfLink": "/api/v1/persistentvolumes/pvc-e34209f8-dd78-4c34-956c-651a5eb7adaa",
        "uid": "f29e1f09-f07a-4107-9037-3c3919fc359c"
    },
    "spec": {
        "accessModes": [
            "ReadWriteOnce"
        ],
        "capacity": {
            "storage": "50Gi"
        },
        "claimRef": {
            "apiVersion": "v1",
            "kind": "PersistentVolumeClaim",
            "name": "data-ibmcloudappmgmt-cassandra-0",
            "namespace": "kube-system",
            "resourceVersion": "1910984",
            "uid": "e34209f8-dd78-4c34-956c-651a5eb7adaa"
        },
        "csi": {
            "driver": "rook-ceph.rbd.csi.ceph.com",
            "fsType": "ext4",
            "nodeStageSecretRef": {
                "name": "rook-csi-rbd-node",
                "namespace": "rook-ceph"
            },
            "volumeAttributes": {
                "clusterID": "rook-ceph",
                "imageFeatures": "layering",
                "imageFormat": "2",
                "pool": "rbd",
                "storage.kubernetes.io/csiProvisionerIdentity": "1583863004786-8081-rook-ceph.rbd.csi.ceph.com"
            },
            "volumeHandle": "0001-0009-rook-ceph-0000000000000001-763700a9-6570-11ea-bdc2-0a580afe080f"
        },
        "persistentVolumeReclaimPolicy": "Delete",
        "storageClassName": "rook-ceph-block-internal",
        "volumeMode": "Filesystem"
    },
    "status": {
        "phase": "Bound"
    }
}

The persistentvolume that rook-ceph created has values that correspond to our persistentvolumeclaim:

The values match, therefore, the persistentvolumeclaim is successfully bound to the persistentvolume.

Other useful references

Important: If IBM Cloud Pak® for Multicloud Management is installed on the IBM® Cloud, it might be using dynamically provisioned block storage such as ibmc-file-gold, which is always available. You can also use ibmc-file-gold for your storage solution when you are installing Monitoring. For more information, see the Storage section in Preparing to install the IBM Cloud Pak® for Multicloud Management. If IBM Cloud Pak® for Multicloud Management is not installed on the IBM® Cloud, you must choose another storage solution for your Monitoring installation.