CephMdsCpuUsageHigh

The storage metadata service (MDS) serves as file system metadata. The MDS is crucial for any file creation, rename, deletion, and update operations. By default, MDS is allocated for two or three CPUs, which does not cause issues usually when not too many metadata operations are on going. When the metadata operation load increases enough to trigger this alert, it means that the default CPU allocation is unable to cope with the load. In such a scenario, increase the CPU allocation or run multiple active MDS servers.

Impact: High

Diagnosis

  1. Click Workloads > Pods.
  2. Select the corresponding MDS pod and click the Metrics tab.

On the Metrics tab, you can see the allocated and used CPU. When the CPU usage exceeds 67% of the allocated CPU for 6 hours, the CephMdsCpuUsageHigh alert is triggered.

Mitigation

Do a vertical or a horizontal scaling of CPU. For more information, see the Description and Runbook section of the alert.

Run the following command to set the number of allocated CPU for MDS. For example, referring eight CPUs in the following command:

 oc patch -n openshift-storage storagecluster ocs-storagecluster \
    --type merge \
    --patch '{"spec": {"resources": {"mds": {"limits": {"cpu": "8"},
    "requests": {"cpu": "8"}}}}}'

Run the following command to run multiple active MDS servers:

 oc patch -n openshift-storage storagecluster ocs-storagecluster\
    --type merge \
    --patch '{"spec": {"managedResources": {"cephFilesystems":{"activeMetadataServers": 2}}}}'

Make sure that enough CPUs are allocated to run multiple active MDS servers, depending on the load.

Important: Always increase the activeMetadataServers by 1. The scaling of activeMetadataServers works only if you have more than one PersistentVolume (PV). If only one PV is causing CPU load, increase the allocated CPU as described in the mitigation section.