SNMP response codes
If you send alerts to your SNMP server, you can use the following information to interpret the response codes that are sent by IBM® Software Hub.
You can monitor the state of IBM Software Hub and your services with platform monitors, service monitors, and privileged monitors. Platform monitors are installed automatically when you install IBM Software Hub. You must manually install service monitors and privileged monitors.
Deployment status check codes
Monitor type: Platform
Each service is configured to maintain a specific number of Deployment replicas.
The check-deployment-status event monitors the status of
Deployment replicas that are associated with IBM Software
Hub and reports any issues.
| Response code | Severity | Description |
|---|---|---|
| 102 | Critical | The service does not have enough replicas. |
| 100 | Information | The monitor that checks the status of the replicas ran. There are no issues to report. |
StatefulSet status check codes
Monitor type: Platform
Each service is configured to maintain a specific number of StatefulSet replicas. The check-statefulset-status event monitors the status of
StatefulSet replicas that are associated with IBM Software
Hub and reports any issues.
| Response code | Severity | Description |
|---|---|---|
| 202 | Critical | The service does not have enough replicas. |
| 200 | Information | The monitor that checks the status of the replicas ran. There are no issues to report. |
PVC status check codes
Monitor type: Platform
A persistent volume claim (PVC) is a request for storage that meets specific criteria, such as a
minimum size or a specific access mode. The check-pvc-status event monitors the
status of the PVCs that are associated with IBM Software
Hub and reports any issues.
| Response code | Severity | Description |
|---|---|---|
| 302 | Critical | The PVC is not associated with a storage volume, which means that the service cannot store data. |
| 300 | Information | The monitor that checks the status of the PVCs ran. There are no issues to report. |
Quota status check codes
Monitor type: Platform
An administrator sets the vCPU quota and memory quota for services or for the platform. The
check-quota-status event monitors the quotas and requests that are associated with
IBM Software
Hub to determine whether services have
sufficient resources to fulfill requests.
| Response code | Severity | Description |
|---|---|---|
| 402 | Critical | The service has insufficient resources to fulfill requests. The service cannot create new pods if the new pods will push the service over the memory quota or the vCPU quota. These pods remain in pending state until sufficient resources are available. |
| 401 | Warning | Check the quota settings and the available resources on the cluster. |
| 400 | Information | The monitor that checks the status of the quotas ran. There are no issues to report. |
Monitor status check codes
Monitor type: Platform
A monitor is a script that checks the state of an entity periodically and generates events based
on the state of the entity. The check-monitor-status event monitors the status of
monitoring jobs to determining whether the jobs completed successfully.
| Response code | Severity | Description |
|---|---|---|
| 502 | Critical | One or more jobs did not complete successfully. |
| 500 | Information | The monitor that checks the status of monitoring jobs ran. There are no issues to report. |
Service check status codes
Monitor type: Platform
A service is comprised of pods and one or more service instances. The
check-service-status event monitors the status of services to determine whether the
pods and instances that are associated with the service are running as expected.
| Response code | Severity | Description |
|---|---|---|
| 602 | Critical | A service instance is in a failed state or a pod is in a failed or unknown state. |
| 601 | Warning | Check the status of the service. A pod that is associated with the service might be pending. |
| 600 | Information | The monitor that checks the status of each service ran. There are no issues to report. |
Service instance check status codes
Monitor type: Platform
A service instance is comprised of one or more pods. The check-instance-status
event monitors the status of service instances to determine whether the pods that are associated
with the instance are running as expected.
| Response code | Severity | Description |
|---|---|---|
| 702 | Critical | One or more pods that are associated with the instance are in a failed or unknown state. |
| 701 | Warning | Check the status of the instance. A pod that is associated with the instance might be pending. |
| 700 | Information | The monitor that checks the status of each instance ran. There are no issues to report. |
Service health check codes
Monitor type: Service
The service-health-check event monitors the functional health of a service to
determine whether the service is healthy
| Response code | Severity | Description |
|---|---|---|
| 802 | Critical | The service is not functioning properly or is not functioning at all. |
| 801 | Warning | The service is partially operational, but some functionality is unavailable. |
| 800 | Information | The service is healthy. |
Node status check codes
Monitor type: Privileged
Each node hosts the pods that run the platform and services, and the overall cluster health
depends on the health of its nodes. The check-network-status event monitors the
health and status of all cluster nodes by monitoring the node conditions and usage statistics. A
critical state indicates that one or more nodes are not in a Ready state or are
consuming excessive resources.
| Response code | Severity | Description |
|---|---|---|
| 902 | Critical | One or more nodes are not ready or are utilizing excessive resources. |
| 901 | Warning | A node health warning condition was detected. |
| 900 | Information | All nodes are healthy. |
Volume status check codes
Monitor type: Privileged
A persistent volume claim (PVC) is a request for storage. The
check-volume-status event monitors whether the PVCs associated with the deployment
are running out of space. A warning or critical state indicates that volume usage exceeds the
configured thresholds.
| Response code | Severity | Description |
|---|---|---|
| 1002 | Critical | Volume usage exceeds the critical threshold. (The default threshold is 90% of the total capacity.) |
| 1001 | Warning | Volume usage exceeds the warning threshold. (The default threshold is 80% of the total capacity.) |
| 1000 | Information | Volume usage is within normal range. |
Operator namespace status check codes
Monitor type: Privileged
The check-operator-namespace-status event checks whether the resources in the
operators project for the deployment are healthy.
| Response code | Severity | Description |
|---|---|---|
| 1102 | Critical | One or more operator resources are not running as expected. |
| 1101 | Warning | A warning condition was detected in operator namespace resources. |
| 1100 | Information | All operator resources are healthy. |
EDB Cluster Status check codes
Monitor type: Privileged
The check-edb-cluster-status event checks whether any instances of EDB Postgres that are associated with the deployment
are healthy.
| Response code | Severity | Description |
|---|---|---|
| 1203 | Critical | Cluster is unhealthy or the replicas are significantly out of sync. Restriction: The replica out of sync check applies only to the
zen-metastore-edb storage
cluster. |
| 1201 | Warning | One or more replicas are unavailable. |
| 1200 | Information | EDB cluster is healthy. |
Cluster operator status check codes
Monitor type: Privileged
The check-cluster-operator-status event checks the status of the cluster
operators that comprise the Red Hat®
OpenShift® Container Platform
infrastructure to determine whether:
- All of the operators are
AVAILABLE - Any of the operators are
DEGRADED
| Response code | Severity | Description |
|---|---|---|
| 1302 | Critical | A cluster operator is unavailable (Available=False) or degraded
(Degraded=True). |
| 1301 | Warning | A warning condition exists for the cluster operator. |
| 1300 | Information | Cluster operator healthy |
Node imbalance status check codes
Monitor type: Privileged
The check-node-imbalance-status event checks whether vCPU requests are balanced
across nodes or whether one node is supporting a disproportionately high load.
A warning state indicates that CPU requests on one node exceed the maximum threshold and that other nodes fall below the minimum threshold. A critical state indicates that CPU imbalance exceeds defined thresholds.
| Response code | Severity | Description |
|---|---|---|
| 1402 | Critical | Node CPU imbalance exceeds defined thresholds. |
| 1401 | Warning | CPU usage imbalance detected. |
| 1400 | Information | CPU requests across nodes are balanced. |
Network status check codes
Monitor type: Privileged
The check-network-status event checks the status of the
PodNetworkConnectivityCheck objects for cluster resources to determine whether the
objects are Reachable.
| Response code | Severity | Description |
|---|---|---|
| 1502 | Critical | Network is not reachable. |
| 1501 | Warning | Network connectivity warning detected. |
| 1500 | Information | Network connectivity is healthy. |
Certificate status check codes
Monitor type: Platform
check-certificate-status event monitors certificates to:- Ensure that the certificates are valid
- Identify when the certificates will expire
- Identify when the certificates will be renewed
- Determine whether certificates were renewed successfully
For certificates that do not have a renewal date, warning and critical events are generated as the certificate approaches is expiration date.
For certificates that have a renewal date, warning and critical events are generated if the certificate is not automatically renewed by the specified date.
| Response code | Severity | Description |
|---|---|---|
| 1602 | Critical | Certificate is close to expiry (default: 7 days) or renewal is significantly overdue (default: 24 hours). |
| 1601 | Warning | Certificate is expiring soon (default: 21 days) or renewal is slightly overdue (default: 1 hour). |
| 1600 | Information | Certificate is valid and not expiring soon. |
Certificate renewal check codes
Monitor type: Platform
The check-certificate-renewal event monitors upcoming certificate renewals so
that you can identify renewals that might cays service disruptions.
| Response code | Severity | Description |
|---|---|---|
| 1701 | Warning | Certificate renewal approaching (default: 3 days before renewal time). |
| 1700 | Information | No certificate renewal events pending. |
Workload quota status check codes
Monitor type: Platform
check-workload-quota-status event monitors the quotas and requests that are
associated with the following objects:- Projects
- Remote physical locations
- Data planes
The event determines whether the workloads associated with the objects have sufficient resources to fulfill requests.
| Response code | Severity | Description |
|---|---|---|
| 1802 | Critical | Workload has insufficient CPU, memory, or GPU resources. |
| 1801 | Warning | Workload quota warning detected. |
| 1800 | Information | Workload quotas are sufficient. |