SNMP response codes

If you send alerts to your SNMP server, you can use the following information to interpret the response codes that are sent by IBM® Software Hub.

You can monitor the state of IBM Software Hub and your services with platform monitors, service monitors, and privileged monitors. Platform monitors are installed automatically when you install IBM Software Hub. You must manually install service monitors and privileged monitors.

`Deployment` status check codes

Monitor type: Platform

Each service is configured to maintain a specific number of Deployment replicas. The check-deployment-status event monitors the status of Deployment replicas that are associated with IBM Software Hub and reports any issues.

Response code	Severity	Description
102	Critical	The service does not have enough replicas.
100	Information	The monitor that checks the status of the replicas ran. There are no issues to report.

StatefulSet status check codes

Monitor type: Platform

Each service is configured to maintain a specific number of StatefulSet replicas. The check-statefulset-status event monitors the status of StatefulSet replicas that are associated with IBM Software Hub and reports any issues.

Response code	Severity	Description
202	Critical	The service does not have enough replicas.
200	Information	The monitor that checks the status of the replicas ran. There are no issues to report.

PVC status check codes

Monitor type: Platform

A persistent volume claim (PVC) is a request for storage that meets specific criteria, such as a minimum size or a specific access mode. The check-pvc-status event monitors the status of the PVCs that are associated with IBM Software Hub and reports any issues.

Response code	Severity	Description
302	Critical	The PVC is not associated with a storage volume, which means that the service cannot store data.
300	Information	The monitor that checks the status of the PVCs ran. There are no issues to report.

Quota status check codes

Monitor type: Platform

An administrator sets the vCPU quota and memory quota for services or for the platform. The check-quota-status event monitors the quotas and requests that are associated with IBM Software Hub to determine whether services have sufficient resources to fulfill requests.

Response code	Severity	Description
402	Critical	The service has insufficient resources to fulfill requests. The service cannot create new pods if the new pods will push the service over the memory quota or the vCPU quota. These pods remain in pending state until sufficient resources are available.
401	Warning	Check the quota settings and the available resources on the cluster.
400	Information	The monitor that checks the status of the quotas ran. There are no issues to report.

Monitor status check codes

Monitor type: Platform

A monitor is a script that checks the state of an entity periodically and generates events based on the state of the entity. The check-monitor-status event monitors the status of monitoring jobs to determining whether the jobs completed successfully.

Response code	Severity	Description
502	Critical	One or more jobs did not complete successfully.
500	Information	The monitor that checks the status of monitoring jobs ran. There are no issues to report.

Service check status codes

Monitor type: Platform

A service is comprised of pods and one or more service instances. The check-service-status event monitors the status of services to determine whether the pods and instances that are associated with the service are running as expected.

Response code	Severity	Description
602	Critical	A service instance is in a failed state or a pod is in a failed or unknown state.
601	Warning	Check the status of the service. A pod that is associated with the service might be pending.
600	Information	The monitor that checks the status of each service ran. There are no issues to report.

Service instance check status codes

Monitor type: Platform

A service instance is comprised of one or more pods. The check-instance-status event monitors the status of service instances to determine whether the pods that are associated with the instance are running as expected.

Response code	Severity	Description
702	Critical	One or more pods that are associated with the instance are in a failed or unknown state.
701	Warning	Check the status of the instance. A pod that is associated with the instance might be pending.
700	Information	The monitor that checks the status of each instance ran. There are no issues to report.

Service health check codes

Monitor type: Service

The service-health-check event monitors the functional health of a service to determine whether the service is healthy

Response code	Severity	Description
802	Critical	The service is not functioning properly or is not functioning at all.
801	Warning	The service is partially operational, but some functionality is unavailable.
800	Information	The service is healthy.

Node status check codes

Monitor type: Privileged

Each node hosts the pods that run the platform and services, and the overall cluster health depends on the health of its nodes. The check-network-status event monitors the health and status of all cluster nodes by monitoring the node conditions and usage statistics. A critical state indicates that one or more nodes are not in a Ready state or are consuming excessive resources.

Response code	Severity	Description
902	Critical	One or more nodes are not ready or are utilizing excessive resources.
901	Warning	A node health warning condition was detected.
900	Information	All nodes are healthy.

Volume status check codes

Monitor type: Privileged

A persistent volume claim (PVC) is a request for storage. The check-volume-status event monitors whether the PVCs associated with the deployment are running out of space. A warning or critical state indicates that volume usage exceeds the configured thresholds.

Response code	Severity	Description
1002	Critical	Volume usage exceeds the critical threshold. (The default threshold is 90% of the total capacity.)
1001	Warning	Volume usage exceeds the warning threshold. (The default threshold is 80% of the total capacity.)
1000	Information	Volume usage is within normal range.

Operator namespace status check codes

Monitor type: Privileged

The check-operator-namespace-status event checks whether the resources in the operators project for the deployment are healthy.

Response code	Severity	Description
1102	Critical	One or more operator resources are not running as expected.
1101	Warning	A warning condition was detected in operator namespace resources.
1100	Information	All operator resources are healthy.

EDB Cluster Status check codes

Monitor type: Privileged

The check-edb-cluster-status event checks whether any instances of EDB Postgres that are associated with the deployment are healthy.

Response code	Severity	Description
1203	Critical	Cluster is unhealthy or the replicas are significantly out of sync. Restriction: The replica out of sync check applies only to the `zen-metastore-edb` storage cluster.
1201	Warning	One or more replicas are unavailable.
1200	Information	EDB cluster is healthy.

Cluster operator status check codes

Monitor type: Privileged

The check-cluster-operator-status event checks the status of the cluster operators that comprise the Red Hat® OpenShift® Container Platform infrastructure to determine whether:

All of the operators are AVAILABLE
Any of the operators are DEGRADED

Response code	Severity	Description
1302	Critical	A cluster operator is unavailable (`Available=False`) or degraded (`Degraded=True`).
1301	Warning	A warning condition exists for the cluster operator.
1300	Information	Cluster operator healthy

Node imbalance status check codes

Monitor type: Privileged

The check-node-imbalance-status event checks whether vCPU requests are balanced across nodes or whether one node is supporting a disproportionately high load.

A warning state indicates that CPU requests on one node exceed the maximum threshold and that other nodes fall below the minimum threshold. A critical state indicates that CPU imbalance exceeds defined thresholds.

Response code	Severity	Description
1402	Critical	Node CPU imbalance exceeds defined thresholds.
1401	Warning	CPU usage imbalance detected.
1400	Information	CPU requests across nodes are balanced.

Network status check codes

Monitor type: Privileged

The check-network-status event checks the status of the PodNetworkConnectivityCheck objects for cluster resources to determine whether the objects are Reachable.

Response code	Severity	Description
1502	Critical	Network is not reachable.
1501	Warning	Network connectivity warning detected.
1500	Information	Network connectivity is healthy.

Certificate status check codes

Monitor type: Platform

The check-certificate-status event monitors certificates to:

Ensure that the certificates are valid
Identify when the certificates will expire
Identify when the certificates will be renewed
Determine whether certificates were renewed successfully

For certificates that do not have a renewal date, warning and critical events are generated as the certificate approaches is expiration date.

For certificates that have a renewal date, warning and critical events are generated if the certificate is not automatically renewed by the specified date.

Response code	Severity	Description
1602	Critical	Certificate is close to expiry (default: 7 days) or renewal is significantly overdue (default: 24 hours).
1601	Warning	Certificate is expiring soon (default: 21 days) or renewal is slightly overdue (default: 1 hour).
1600	Information	Certificate is valid and not expiring soon.

Certificate renewal check codes

Monitor type: Platform

The check-certificate-renewal event monitors upcoming certificate renewals so that you can identify renewals that might cays service disruptions.

Response code	Severity	Description
1701	Warning	Certificate renewal approaching (default: 3 days before renewal time).
1700	Information	No certificate renewal events pending.

Workload quota status check codes

Monitor type: Platform

The check-workload-quota-status event monitors the quotas and requests that are associated with the following objects:

Projects
Remote physical locations
Data planes

The event determines whether the workloads associated with the objects have sufficient resources to fulfill requests.

Response code	Severity	Description
1802	Critical	Workload has insufficient CPU, memory, or GPU resources.
1801	Warning	Workload quota warning detected.
1800	Information	Workload quotas are sufficient.

SNMP response codes

Deployment status check codes

StatefulSet status check codes

PVC status check codes

Quota status check codes

Monitor status check codes

Service check status codes

Service instance check status codes

Service health check codes

Node status check codes

Volume status check codes

Operator namespace status check codes

EDB Cluster Status check codes

Cluster operator status check codes

Node imbalance status check codes

Network status check codes

Certificate status check codes

Certificate renewal check codes

Workload quota status check codes

`Deployment` status check codes