What does IBM monitor?
IBM monitors all three levels of infrastructure.
System and infrastructure monitoring
|Server disk usage||IBM monitors and ensures that the server file systems have disk space availability.|
|Disk I/O||IBM tracks for I/O issues in the operations on the file systems.|
|Physical memory||IBM monitors physical memory usage of the servers.|
|Server CPU usage||IBM monitors CPU for spikes and tracks data for performance trends.|
|Network||IBM monitors network connectivity and bandwidth in various ways and at multiple layers, including the synthetic monitoring and internal methods. Includes servers, firewalls, routers, proxy servers, and load balancers.|
|Load balancer (HAProxy)||IBM monitors and ensures that the application load balancers (HAProxy) are up and listening. URL monitoring ensures that requests are going to the application servers.|
|Traffic abnormality||IBM monitors abnormal web traffic and prevents DDOS attacks.|
Application level monitoring
|Docker containers||IBM monitors docker containers to ensure that they are always up and running.|
|Application error rate||IBM monitors all application and background services to evaluate error rate. An alert is triggered when an error rate exceeds a threshold within a defined time frame.|
|Message propagation delay||IBM monitors message propagation rate to ensure minimal delay.|
|Aggregated delay||IBM monitors the total delay between the start and end of a background process.|
|Web server response time||IBM monitors API response times to ensure that the times fall within satisfactory ratings similar to APDEX.|
|Service throughput||IBM ensures that the number of requests that are processed per minute are within a satisfactory threshold.|
Application delay monitoring
Some API requests are processed asynchronously, so it is important to monitor these process times. For example, in a supply update, the user receives an accepted status but does not have a way to confirm when the process is completed.
Monitoring plays an important role to ensure that the process time falls within a satisfactory threshold. When a threshold is exceeded, the operations team is alerted immediately to resolve the issue. Refer to your tenant's service level agreement for threshold or tolerance level details.
- Message propagation delay
- Network delay
- Response time (web and background process)
- Aggregate delay
The metrics are tracked at a global level across all tenant accounts.
|Connectivity||Every 5 minutes||Synthetic monitoring invokes API requests to each supported API and verifies for expected response status and satisfactory response time.|
|Data Integrity||Every 15 minutes||Synthetic monitoring simulates typical use cases and ensures data that is accepted by the system is retrievable. Also, the computed values are validated for correctness.|
|Backup propagation||Hourly||Ensures that the data processed at a location is correctly stored and retrievable from a backup location.|