Metrics of data planes

Monitor key performance metrics such as status, performance, throughput, response time, latency, and error rate to assess the health and efficiency of data planes. It is crucial to maintain the reliability, security, and optimal performance of all runtimes associated with data planes.

The widgets on the Monitor data planes page are explained in detail as follows.

Status, performance, and used capacity

The widget provides a comprehensive view of all data planes by displaying their status, performance, and used capacity.

Status

The status of a data plane indicates the health of the data plane. It is determined by the availability of all runtimes that is associated with that data plane. The following are the health status of the data plane.

Status Description
status active Indicates that the data plane is in a healthy state. It means that the status of all runtimes in that data plane is green and available.
status active inactive Indicates that the status of atleast one runtime in that data plane is red.
status inactive Indicates that the data plane is not in a healthy state. It means that the status of all runtimes in that data plane is red.
Grey Indicates that there is no adequate data to compute the availability of runtimes in the data plane.

For example, if you register a runtime an hour ago and try to view the dashboard by applying the Last 24 hours filter, the status appears gray due to the lack of available data during that time period.

For details about how to determine the availability of runtimes, see Availability section in Metrics of runtimes.

Performance

Performance shows how good each data plane is performing based on two metrics - error rate and latency. The threshold for the latency and error rate is specified in the Settings page. Set your preferences by customizing the display settings of the widgets and threshold values for the parameters that control the monitoring capability of the assets.

The following are the performance status of the data plane.

Performance status Description
status active Green indicates that both latency and error rate threshold values are within the acceptable limit.
status active inactive Amber indicates that either latency or error rate threshold value is greater than the acceptable limit and within the warning limit.
status inactive Red indicates that both latency and error rate threshold values are greater than the unacceptable limit.
Grey Gray indicates that there is no adequate data to compute the performance of the runtimes in the data plane.

For example, if you register a runtime an hour ago and you try to view the dashboard by applying the Last 24 hours filter, the status appears as gray due to the lack of available data during that time period.

How to determine the latency for all data planes?

Latency of a data plane= Total latency of runtime A + Total latency of runtime B + ... Total transactions of runtime A + Total transactions of runtime B + ...

Where,

Total latency of a runtime = Average latency of the runtime × Total transactions of the runtime

Example,

To determine the latency of all data planes for 6 hours, assume,

Runtime Total latency Total transactions
A 0.05 ms 60
B 0.08 ms 80

Latency of the data plane for 6 hours = (0.05+0.08)/(60+80) = 9.28 ms

Used capacity

The used capacity of the data plane shows the percentage of storage that is used by all runtimes in the data plane. The following are the used capacity status of the data plane.

Used capacity status Description
Green Green indicates that the capacity rate is lesser than or equal to 30%
Amber Amber indicates that the capacity rate is greater than 30% and lesser than or equal to 70%
Red Red indicates that the capacity rate is greater than 70%
Grey Gray indicates one of the following,
  • There is no adequate data to compute the capacity rate. For example, if you register a runtime an hour ago and try to view the dashboard by applying the Last 24 hours filter, the status appears gray due to the lack of available data during that time period.
  • The capacity is not defined for all runtimes in the data plane.

How to determine the used capacity of the data plane?

Used capacity of the data plane= Actual TPS of runtime A + Actual TPS of runtime B Expected TPS of runtime A + Expected TPS of runtime B X 100

Example,

To determine the used capacity of a data plane, assume that the capacity is defined as 10000 transactions per hour for runtime A. For more details about how to define the capacity of a runtime, see Managing runtimes in federated API management.

The expected TPS of the runtime A = 10000 / (1 hr * 60 min * 60 secs) = 2.77 transactions per second.

Assume that runtime A performed 39000 transactions in the last 6 hours.

The actual TPS of the runtime A = 39000 / (6 hrs * 60 mins * 60 secs) = 1.81 transactions per second.

Assume the expected TPS of the runtime B =3.33 transactions per second

Actual TPS of the runtime B = 1.89 transactions per second

Used Capacity of the data plane =((1.81+1.89)/(2.77+3.33) /)*100 = 60%

The capacity of the data plane is displayed as 60% in amber.

Problematic data planes by

The widget helps identify issues across data planes by analyzing total transaction volumes, error rate, response time, and latency of all associated runtimes. Choose a business metric to view its corresponding least and most problematic data plane. The data plane split and line graph display the data plane in a sequence from the most problematic to the least. Click the Show as table icon to view the line graph in table format. Click ellipsis icon to download the table in CSV, PNG, or JPG formats.

Transactions

The widget provides a comprehensive overview of total transaction volumes across all data planes with a trend percentage. It provides detailed insights into the most-performing and least-performing data planes to help you make informed business decisions. It presents both pictorial and graphical representations of the data planes based on performance, highlighting maximum and minimum transaction counts.

How to determine the total transactions and trend analysis for all data planes?

Total transactions of the data plane = Total transactions of runtime A + Total transactions of runtime B + …..

With Transactions trend analysis, you can compare current and past transactions based on the filter you choose to spot changes.

Transactions Trend (%) for all data planes for 24 hours = Total transactions of data plane in last 24h Total transactions of data plane in previous 24h Total transactions of data plane in previous 24h × 100

You can replace 24 hours with the time range you select in the filter. If the determined trend percentage of transactions is,
  • A positive value, it is displayed in green with an upward arrow that indicates that the transaction volume is increased.
  • A negative value, it is displayed in red with a downward arrow that indicates that the transaction volume is decreased.

Example,

To determine the transactions trend percentage for all data planes for 24 hours, assume,

Total transactions of data plane in the last 24 hours Total transactions of data plane in the previous 24 hours (that is 24 hours prior to the last 24 hours)
1000 500

Transactions trend percentage for 24 hours for all data planes =[(1000-500)/500]*100 = 100%

The determined transactions trend percentage, 100% (positive), appears in green with an upward arrow indicating increased transactions compared to the previous 24 hours.

Availability

With the widget, you can monitor the availability of all data planes and the health status of all its associated runtimes.

How to determine the availability and trend analysis for all data planes?

Availability of the data plane = Sum of the heartbeats from runtime A (selected period) Number of minutes in selected period + Sum of the heartbeats from runtime B (selected period) Number of minutes in selected period + Number of runtimes × 100

Example,

To determine the availability of a data plane,

The availability of a data plane is determined based on the sum of the heartbeat status value of the associated runtimes.
  • If federated API management receives the heartbeat status from the runtime, the value of the heartbeat status is represented as 1.
  • If federated API management does not receive the heartbeat status from the runtime, the value of the heartbeat status is represented as 0.

The frequency at which the runtime must send the heartbeat status to federated API management is defined in the federated API management Agent configuration. For details about how to configure the federated API management Agent, see Connecting IBM webMethods API Gateway to federated API management section in IBM webMethods API Gateway Administration guide. For details about how to develop an agent by using SDK, see Agent implementation approaches.

To determine the availability of a data plane with 2 runtimes for 1-hour time period, assume,

Runtime Sum of the heart beats received Sum of the heart beats expected
A 55 60
B 60 60

Number of minutes in 1 hour = 60

Availability of a data plane = [((55/60) + (60/60))/2]*100 = 95.8%

You can replace 1 hour with the time range that you select in the filter.

With Availability trend analysis, you can compare current and past availability status based on the filter you choose to spot changes.

Availabilty trend (%) for all data planes for 24 hours = Availabilty of all data planes in last 24h Availabilty of all data planes in previous 24h Availabilty of all data planes in previous 24h × 100

You can replace 24 hours with the time range that you select in the filter. If the determined trend percentage of availability is,
  • A positive value, it is displayed in green with an upward arrow that indicates that the availability is increased.
  • A negative value, it is displayed in red with a downward arrow that indicates that the availability is decreased.

Example,

To determine the availability trend percentage for all data planes for 24 hours, assume,

Availability of all data planes in last 24 hours Availability of all data planes in the previous 24 hours (that is 24 hours prior to the last 24 hours)
98% 80%

Availability trend percentage for data planes for 24 hours filter = (0.98-0.80)/0.80]*100 = 22%

The determined availability trend percentage, 22% (positive), appears in green with an upward arrow that indicates the availability is increased compared to the previous 24 hours.

Error rate

The error rate indicates the percentage of errors that occurred during API transactions. This widget provides a comprehensive overview of the error rate of all data planes. It provides the details of the data plane with maximum and minimum error rate to make informed business decisions. Click Top data planes or All data planes tab to view its respective error rate in the data plane split and line graph in the order of maximum to minimum. Click the Show as table icon to view the line graph in table format. Click eclipse icon to download the table in CSV, PNG, or JPG formats.

How to determine the error rate and trend analysis for all data planes?

Error count of the data plane = Total error count of runtime A + Total error count of runtime B + ...

Error rate of the data plane = Total error count of runtime A + Total error count of runtime B + ... Total transactions of runtime A + Total transactions of runtime B + ... × 100

Example,

To determine the error rate of all data planes for 6 hours, assume,

Runtime Total error count Total transactions
A 30 60
B 25 80

Error rate of all data planes for 6 hours = [(30+25)/(60+80)] * 100 = 39%

With Error rate trend analysis, you can compare current and past error rates that are based on the filter you choose to spot changes.

Error rate trend (%) for all data planes for 24 hours = Error rate of all data planes in last 24h Error rate of all data planes in previous 24h Error rate of all data planes in previous 24h × 100

You can replace 24 hours with the time range that you select in the filter. If the determined error rate trend percentage is,
  • A positive value, it is displayed in red with an upward arrow, which indicates that the error rate is increased.
  • A negative value, it is displayed in green with a downward arrow, which indicates that the error rate is decreased.

Example,

To determine the error rate trend percentage for all data planes for 24 hours, assume,

Error rate of all data planes in the last 24 hours Error rate of all data planes in the previous 24 hours (that is 24 hours prior to the last 24 hours)
39% 52%

Error rate trend percentage for 24 hours filter for all data planes = [(0.39-0.52)/0.52] * 100 = -25%

The determined error rate trend percentage, -25% (negative), appears in green with a downward arrow that indicates the error rate is decreased compared to the previous 24 hours.