Managing etcd clusters

IBM® Cloud Private uses etcd. Use the etcd documentation Opens in a new tab, as a guide to maintaining etcd in IBM Cloud Private.

Space quota

Use the flag --quota-backend-bytes command to set space quota. The default value for space quota is 2 GB, which is a conservative space quota suitable for most applications. The maximum value is 8 GB.

You can change the space quota value before or after installation.

For more information about space quota, see etcd documentation Opens in a new tab.

History compaction

IBM® Cloud Private includes the --etcd-compaction-interval flag for configuration of the etcd compaction interval in the API server. The compaction interval default value is 5 minutes, which is also the value used by IBM Cloud Private.

You can change the interval value before or after installation.

For more information about compaction, see etcd documentation Opens in a new tab.

Defragmentation

Defragmentation releases storage space back to the file system.

IBM Cloud Private version 3.10, does not provide a default configuration for defragmentation. You can run a job to perform the defragmentation process. Alternately, run a cron job to periodically perform defragmentation so you can avoid reaching the space quota based on the cluster workload.

Note Defragmentation of a live member blocks the system from reading and writing data while the system rebuilds its states. Consider running your job during maintenance time.

For more information about defragmentation, see etcd documentation Opens in a new tab.

Running a defragmentation job

Complete the following steps to create a job and run the defragmentation process.

  1. In the following job etcd-defrag-job.yaml file example, replace 10.10.25.10 10.10.25.11 with your etcd node IP (separated by a space).

    apiVersion: batch/v1
    kind: Job
    metadata:
      name: etcd-defrag-job
    spec:
      template:
        spec:
          containers:
          - name: etcd
            image: ibmcom/etcd:v3.2.18
            args:
            - /bin/sh
            - -c
            - etcdctl='etcdctl --cacert=/etc/cfc/conf/etcd/ca.pem --cert=/etc/cfc/conf/etcd/client.pem --key=/etc/cfc/conf/etcd/client-key.pem';
              export ETCDCTL_API=3;
              for endpoint in 10.10.25.10 10.10.25.11 10.10.25.12;
              do
                $etcdctl --endpoints="https://${endpoint}:4001" defrag;
                $etcdctl --endpoints="https://${endpoint}:4001" --write-out=table endpoint status;
              done;
              $etcdctl --endpoints="https://${endpoint}:4001" alarm disarm;
              $etcdctl --endpoints="https://${endpoint}:4001" alarm list;
            volumeMounts:
            - mountPath: /etc/cfc/conf/etcd
              name: etcd-certs
          volumes:
          - hostPath:
              path: /etc/cfc/conf/etcd
              type: ""
            name: etcd-certs
          restartPolicy: OnFailure
          nodeSelector:
            etcd: "true"
          tolerations:
            - key: "dedicated"
              operator: "Exists"
              effect: "NoSchedule"
    
  2. Create a job from web UI or running following command:

    $ kubectl create -f ./etcd-defrag-job.yaml -n kube-system
    job.batch/etcd-defrag-job created
    
  3. After you create the job, enter the following command to see the job status:

    $ kubectl get job -n kube-system | grep etcd-defrag-job
    NAME              DESIRED   SUCCESSFUL   AGE
    etcd-defrag-job   1         1            1m
    
  4. See the pod logs to view defragmentation details.

    $ kubectl logs etcd-defrag-job-48kxs -n kube-system
    Finished defragmenting etcd member[https://10.10.25.10:4001]
    +--------------------------+------------------+---------+---------+-----------+-----------+------------+
    |         ENDPOINT         |        ID        | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX |
    +--------------------------+------------------+---------+---------+-----------+-----------+------------+
    | https://10.10.25.10:4001 | 8271bc8ee51f9f39 |  3.2.18 |  9.4 MB |     false |      6051 |     255138 |
    +--------------------------+------------------+---------+---------+-----------+-----------+------------+
    Finished defragmenting etcd member[https://10.10.25.11:4001]
    +---------------------------+------------------+---------+---------+-----------+-----------+------------+
    |         ENDPOINT          |        ID        | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX |
    +---------------------------+------------------+---------+---------+-----------+-----------+------------+
    | https://10.10.25.11:4001 | 6e235e51838ea635 |  3.2.18 |  9.3 MB |     false |      6051 |     255152 |
    +---------------------------+------------------+---------+---------+-----------+-----------+------------+
    …
    

Using a cron job for defragmentation

You can run a cron job during a scheduled maintenance time. The following example of a cron job runs every minute for testing. Complete the following steps before you create the cron job.

  1. Replace 10.10.25.10 10.10.25.11 with your etcd node IP (separated by a space).
  2. Modify spec.schedule to set your own time table.
    apiVersion: batch/v1beta1
    kind: CronJob
    metadata:
      name: etcd-defrag-cronjob
    spec:
      schedule: "*/1 * * * *"
      jobTemplate:
        spec:
          template:
            spec:
              containers:
              - name: etcd
                image: ibmcom/etcd:v3.2.18
                args:
                - /bin/sh
                - -c
                - etcdctl='etcdctl --cacert=/etc/cfc/conf/etcd/ca.pem --cert=/etc/cfc/conf/etcd/client.pem --key=/etc/cfc/conf/etcd/client-key.pem';
                  export ETCDCTL_API=3;
                  for endpoint in 10.10.25.10 10.10.25.11 10.10.25.12 ;
                  do
                    $etcdctl --endpoints="https://${endpoint}:4001" defrag;
                    $etcdctl --endpoints="https://${endpoint}:4001" --write-out=table endpoint status;
                  done;
                  $etcdctl --endpoints="https://${endpoint}:4001" alarm disarm;
                  $etcdctl --endpoints="https://${endpoint}:4001" alarm list;
                volumeMounts:
                - mountPath: /etc/cfc/conf/etcd
                  name: etcd-certs
              volumes:
              - hostPath:
                  path: /etc/cfc/conf/etcd
                  type: ""
                name: etcd-certs
              restartPolicy: OnFailure
              nodeSelector:
                etcd: "true"
              tolerations:
                - key: "dedicated"
                  operator: "Exists"
                  effect: "NoSchedule"