What does IBM monitor?

IBM monitors all three levels of infrastructure.

System and infrastructure monitoring

System and infrastructure monitoring checks the health and well-being of the physical server hardware, virtual machine resources, and network. This monitoring includes CPU and operating system memory usage, file system health, network availability, throughput, and more. Infrastructure monitoring and alerting to the IBM operations team are standard services that are provided as part of your application accounts.
Note: System and infrastructure monitoring is performed for all environments. However, alerts are raised for the production account only.
The following table identifies some of the system and infrastructure monitoring that is conducted by the application.
Monitoring level Description
Server disk usage IBM monitors and ensures that the server file systems have disk space availability.
Disk I/O IBM tracks for I/O issues in the operations on the file systems.
Physical memory IBM monitors physical memory usage of the servers.
Server CPU usage IBM monitors CPU for spikes and tracks data for performance trends.
Network IBM monitors network connectivity and bandwidth in various ways and at multiple layers, including the synthetic monitoring and internal methods. Includes servers, firewalls, routers, proxy servers, and load balancers.
Load balancer (HAProxy) IBM monitors and ensures that the application load balancers (HAProxy) are up and listening. URL monitoring ensures that requests are going to the application servers.
Traffic abnormality IBM monitors abnormal web traffic and prevents DDOS attacks.

Application level monitoring

The application runs over the server infrastructure and provides the software and services to support the application. The application includes monitoring the application JVM, application server node instances, database logical servers, messaging systems, and application components.
Note: Application monitoring is always performed on production environments and only limited monitoring in pre-production environments.
The following table identifies some of the application monitoring that is conducted by the application.
Table 1. Application level monitoring
Monitoring level Description
Docker containers IBM monitors docker containers to ensure that they are always up and running.
Middleware components
  • IBM monitors the application server nodes.
  • IBM monitors all background services.
  • IBM monitors the database server for various performance spikes and abnormalities.
  • IBM monitors messaging systems to ensure satisfactory response time and message propagation rate. Open a ticket with IBM Support to gain more insight on a particular process if you encounter a delayed response.
  • Web and application servers are clustered for High Availability. High Availability load balancer (proxy servers) distribute traffic across clusters and handle failover.
Application error rate IBM monitors all application and background services to evaluate error rate. An alert is triggered when an error rate exceeds a threshold within a defined time frame.
Message propagation delay IBM monitors message propagation rate to ensure minimal delay.
Aggregated delay IBM monitors the total delay between the start and end of a background process.
Web server response time IBM monitors API response times to ensure that the times fall within satisfactory ratings similar to APDEX.
Service throughput IBM ensures that the number of requests that are processed per minute are within a satisfactory threshold.

Application delay monitoring

Some API requests are processed asynchronously, so it is important to monitor these process times. For example, in a supply update, the user receives an accepted status but does not have a way to confirm when the process is completed.

Monitoring plays an important role to ensure that the process time falls within a satisfactory threshold. When a threshold is exceeded, the operations team is alerted immediately to resolve the issue. Refer to your tenant's service level agreement for threshold or tolerance level details.

Delay monitoring uses a few key metrics:
  • Message propagation delay
  • Network delay
  • Response time (web and background process)
  • Aggregate delay
The following diagram illustrates the monitored metrics for each service and the associated background process.
Diagram illustrates network delay, response time, propagation delay, and aggregate delay

The metrics are tracked at a global level across all tenant accounts.

Synthetic monitoring

Synthetic monitoring is a technique that uses a specialized tool to simulate web requests to each of the supported APIs. Synthetic monitoring is implemented for the following levels:
Table 2. Synthetic monitoring
Monitoring level Frequency Description
Connectivity Every 5 minutes Synthetic monitoring invokes API requests to each supported API and verifies for expected response status and satisfactory response time.
Data Integrity Every 15 minutes Synthetic monitoring simulates typical use cases and ensures data that is accepted by the system is retrievable. Also, the computed values are validated for correctness.
Backup propagation Hourly Ensures that the data processed at a location is correctly stored and retrievable from a backup location.