Logging and monitoring
Red Hat OpenShift Container Platform has built-in capabilities for monitoring and logging. You can monitor in real-time the behavior of the cluster resources and application and create alerts if resource consumption exceeds limits. For application behavior and operational purposes, RHOCP is able to monitor system and application Logs, and display them or forward them to an external Logging tool to be analyzed.
Monitoring in RHOCP
To learn more about monitoring, the different components and how to deploy and configure monitoring see Monitoring(Red Hat documentation).
Red Hat OpenShift Container Platform includes a pre-configured, preinstalled, and self-updating monitoring stack that is based on the Prometheus open source project and its wider eco-system. It provides monitoring of cluster components and includes a set of alerts to immediately notify the cluster administrator about any occurring problems. The cluster monitoring stack is only supported for monitoring Red Hat OpenShift Container Platform clusters.

Red Hat OpenShift Logging
As a cluster administrator, you can deploy Red Hat OpenShift Logging to aggregate all the logs from your RHOCP cluster, such as node system audit logs, application container logs, and infrastructure logs. Logging aggregates these logs from throughout your cluster and stores them in a default log store.
To learn more about Red Hat OpenShift Logging, see Red Hat OpenShift Logging(Red Hat documentation).
Red Hat OpenShift Logging aggregates the following types of logs:
- Application - Container logs generated by user applications running in the cluster, except infrastructure container applications.
- Receiver - The receiver input type enables the Logging system to accept logs from external
sources. It supports two formats for receiving logs:
httpandsyslog. - Infrastructure - Logs generated by infrastructure components running in the cluster and Red Hat
OpenShift Container Platform nodes, such as journal logs. Infrastructure components are pods that
run in the
openshift*,kube*, or default projects. - Audit - Logs generated by the node audit system (auditd), which are stored in the /var/log/audit/audit.log file, and the audit logs from the Kubernetes apiserver and the Red Hat OpenShift API Server.
Sizing estimates for Monitoring and Logging
The monitoring stack imposes additional resource requirements. Consult the computing resources recommendations in Scaling the Cluster Monitoring Operator (Red Hat documentation) and verify that you have sufficient resources.
| Number of Nodes | Number of Pods | Prometheus storage growth per day | Prometheus storage growth per 15 days | RAM Space (per scale size) | Network (per tsdb chunk) |
|---|---|---|---|---|---|
| 50 | 1800 | 6.3 GB | 94 GB | 6 GB | 16 MB |
| 100 | 3600 | 13 GB | 195 GB | 10 GB | 26 MB |
| 150 | 5400 | 19 GB | 283 GB | 12 GB | 36 MB |
| 200 | 7200 | 25 GB | 375 GB | 14 GB | 46 MB |
Approximately 20% of the expected size is added as overhead to ensure that the storage requirements do not exceed the calculated value.
The calculation is based on the default Red Hat OpenShift Container Platform Cluster Monitoring Operator.
Recommendations for Red Hat OpenShift Container Platform
- Use at least three infrastructure (infra) nodes.
- Use at least three RHOCP container storage nodes with fast storage drives.
Configuration of Monitoring and Logging
The supported way of configuring Red Hat OpenShift Container Platform Monitoring is documented in About OpenShift Container Platform monitoring (Red Hat documentation). Do not use other configurations, as they are unsupported. Configuration paradigms might change across Prometheus releases, and such cases can be handled gracefully only if all configuration possibilities are controlled. If you use configurations other than the ones described in this section, your changes disappear because the cluster-monitoring-operator reconciles any differences. The operator reverses everything to the defined state by default and by design.
Explicitly unsupported cases include:
- Creating extra ServiceMonitor objects in the
openshift-*namespaces. This extends the targets to the cluster monitoring Prometheus instance scrapes, which can cause collisions and load differences that cannot be accounted for. These factors might make the Prometheus setup unstable. - Creating unexpected ConfigMap objects or PrometheusRule objects. This causes the cluster monitoring Prometheus instance to include extra alerting and recording rules.
- Modifying resources of the stack. The Prometheus Monitoring Stack ensures that its resources are always in the state it expects them to be. If they are modified, the stack resets them.
- Using the resources of the stack for your purposes. The resources that are created by the Prometheus Cluster Monitoring stack are not meant to be used by any other resources, as there are no guarantees about their compatibility with an earlier version.
- Stopping the Cluster Monitoring Operator from reconciling the monitoring stack.
- Modifying the monitoring stack Grafana instance.
Configuring persistent storage
Running cluster monitoring with persistent storage means that your metrics are stored to a persistent volume (PV) and can survive a pod being restarted or re-created. This is ideal if you require your metrics or alerting data to be guarded from data loss. For production environments, it is highly recommended to configure persistent storage. Because of the high IO demands, it is advantageous to use local storage.
For details see Optimizing storage
Prerequisites
- Dedicate sufficient local persistent storage, to ensure that the disk does not become full. How much storage you need depends on the number of pods.
- Make sure that you have a persistent volume (PV) ready to be claimed by the persistent volume claim (PVC), one PV for each replica. Because Prometheus has two replicas and Alertmanager has three replicas, you need five PVs to support the entire monitoring stack. The PVs should be available from the Local Storage Operator. This does not apply if you enable dynamically provisioned storage.
- Use the block type of storage.
- Persistent storage using local volumes.
By default, the Prometheus Cluster Monitoring stack configures the retention time for Prometheus data to be 15 days. You might want to modify the retention time to change how soon the data is deleted.
5.9.6 Alertmanager
The Prometheus Alertmanager is a component that manages incoming alerts, including:
- Alert silencing
- Alert inhibition
- Alert aggregation
- Reliable deduplication of alerts
- Grouping alerts
- Sending grouped alerts as notifications through receivers such as email and PagerDuty
Red Hat OpenShift Container Platform monitoring includes the Watchdog alert, which fires continuously. Alertmanager repeatedly sends notifications for the Watchdog alert to the notification provider. For example, to PagerDuty. The provider is configured to notify the administrator when it stops receiving the Watchdog alert. This mechanism helps ensure continuous operation of Prometheus as well as continuous communication between Alertmanager and the notification provider.
Red Hat OpenShift Container Platform Cluster Monitoring by default ships with a set of pre-defined alerting rules.
- The default alerting rules are used specifically for the Red Hat OpenShift Container Platform cluster and nothing else. For example, you get alerts for a persistent volume in the cluster, but you do not get them for persistent volume in your custom namespace.
- Some alerting rules have identical names. This is intentional. They send alerts about the same event with different thresholds, with different severity, or both.
- With the inhibition rules, the lower severity is inhibited when the higher severity is firing.
Monitoring your own services
You can use RHOCP monitoring for your own services in addition to monitoring the cluster. This way, you do not need to use an extra monitoring solution. This helps keep monitoring centralized. Additionally, you can extend the access to the metrics of your services beyond cluster administrators. This enables developers and arbitrary users to access these metrics. You can also export custom application metrics for the horizontal pod autoscaler.
To get the technical details on how to enable monitoring of your own services, see Configure user workload monitoring.