Infrastructure metrics include system metrics and container metrics.
For information about container metrics, see
Container Metrics.
System Metrics
Monitor the following metrics to analyze the health of
Terracotta
server.
- CPU usage
- Disk usage
- Memory usage
If the metrics return an exceeded threshold value, consider the
severity as mentioned below and perform the possible actions that
Software AG
recommends to identify and debug the problem and contact
Software AG
for further support.
Note: The threshold values, configurations, and severities that are
mentioned throughout this section are the guidelines that
Software AG
suggests for an optimal performance of API Gateway. You can modify these
thresholds or define actions based on your operational requirements.
To generate thread dump and heap dump for monitoring various
system metrics, see
Troubleshooting: Monitoring Terracotta Server Array.
Monitor
|
Description
|
CPU usage
|
If the CPU usage of the system is above the recommended
threshold value, consider the severity as mentioned:
Above
80% threshold for
15 minutes continuously,
Severity:
WARNING
Above
90% threshold for
15 minutes continuously,
Severity:
CRITICAL
The steps to identify the causes of higher CPU usage are as
follows:
- Identify the process that consumes the highest CPU.
- Generate the thread dump.
- Analyze the thread dump and logs to identify the problem.
- Monitor the process closely. If the process fails, it
should recreate.
- Check if the active-passive quorum is intact using the
following script:
SAGInstallDirectory/Terracotta/server/bin/server-stat.sh
- Check if
API Gateway
clients can establish the connection to
Terracotta
cluster using the following REST endpoint
GET
/rest/apigateway/health/engine
|
Disk usage
|
If the disk usage of the Terracotta server
shows a higher value, rotate logs based on a fixed size and fix the number of
rotated files to be persisted.
|
Memory usage
|
If the memory usage is above the recommended threshold
value, consider the severity as mentioned:
Above
80% threshold, Severity:
WARNING
Above
90% threshold, Severity:
CRITICAL
The steps to identify the causes of higher memory usage are
as follows:
- Identify the process that consumes more memory.
- Start the
Terracotta
Management Console (TMC) and check the heap usage, off-heap usage and warnings.
- Analyze the memory dump and
Terracotta
logs to identify the issue.
- Monitor the process closely.
- Check if the active-passive quorum is intact using the
following script:
SAGInstallDirectory/Terracotta/server/bin/server-stat.sh
- Check if
API Gateway
clients can establish the connection to
Terracotta
cluster using the following REST endpoint
GET
/rest/apigateway/health/engine
|