Monitor SLA

This policy monitors a set of run-time performance conditions for an API, and sends alerts to a specified destination when the performance conditions are violated. This policy enables you to monitor run-time performance for one or more specified applications. You can configure this policy to define a Service Level Agreement (SLA), which is a set of conditions that defines the level of performance that an application should expect from an API. You can use this policy to identify whether the API threshold rules are met or exceeded. For example, you might define an agreement with a particular application that sends an alert to the application if responses are not sent within a certain maximum response time. You can configure SLAs for each API or application combination.

Parameters like success count, fault count and total request count are immediate monitoring parameters and the evaluation happens immediately after the limit is breached. The rest of the parameters are Aggregated monitoring parameters whose evaluation happens once the configured interval is over. If there is a breach in any of the parameters, an event notification ( Monitor event) is sent to the configured destination. In a single policy, multiple action configurations behave as AND condition. The OR condition can be achieved by configuring multiple policies.

The table lists the properties that you can specify for this policy:

Parameter Value
Action Configuration. Specifies the type of action to be configured.  
Name Specifies the name of the metric to be monitored.

You can select one of the available metrics:

  • Availability. Indicates whether the native API is available to the clients as specified in the current interval. webMethods API Gateway calculates the availability of the native API based on the alert interval specified and it is calculated from the instant the API activation takes place.

    The availability of the API is calculated as = (time for which the native API is up / total interval of time) x 100. This value is measured in %.

    For example, if you set Availability as less than 90, then whenever the availability of the native API falls below 90%, in the specified time interval, webMethods API Gateway generates an alert. Suppose, the alert interval is set as 1 minute (60 seconds) and if there are 7 API invocations at various times in that 1 minute with a combination of up and down as shown in the table, the availability is calculated as follows.

    Request# Invocation time Service status Up time
    1 5 Up 5 (from start to now)
    2 15 Up 10 (between 1 and 2)
    3 30 Down 15 (between 2 and 3)
    4 40 Down 0 (since last state is down)
    5 45 Up 0
    6 50 Down 5 (between 5 and 6)
    7 55 Up 0
          5 (remaining 5 seconds considered as Up inline with last state)
      Total   40 (Availability is 67%)

    As the availability of the native API calculated is 66.67% and falls below 90%, API Gateway generates an alert. The API is considered to be down for the ongoing request when API Gateway receives a connection related error from the native API in the outbound call. If the API does not respond with an HTTP response, then it is considered as down.

  • Average Response Time. Indicates the average time taken by the service to complete all invocations in the current interval. The average is calculated from the instant the API activation takes place for the configured interval.

    For example, if you set an alert for Average response time greater than 30 ms with an interval of 1 minute then on API activation, the monitoring interval starts and the average of the response time of all runtime invocations for this API in 1 minute is calculated. If this is greater than 30 ms, then a monitor event is generated. If this is configured under Monitor SLA policy with an option to configure applications so that application specific SLA monitoring can be done, then the monitoring for the average response time is done only for the specified application.

  • Fault Count. Indicates the number of faults returned in the current interval. The HTTP status codes greater than or equal to 400, returned from webMethods API Gateway are considered as fault request transactions. This includes the downtime errors as well.
  • Maximum Response Time. Indicates the maximum time in milliseconds (ms) to respond to a request in the current interval.
  • Minimum Response Time. Indicates the minimum time in milliseconds (ms) to respond to a request in the current interval.
  • Success Count. Indicates the number of successful requests in the current interval.
  • Total Request Count. Indicates the total number of requests (successful and unsuccessful) in the current interval.
Operator Specifies the operator applicable to the metric selected.

Select one of the available operators, Greater Than, Less Than, Equals To.

Value Specifies the alert value for which the monitoring is applied.
Destination Specifies the destination where the alert is to be logged.

Select the required options:

  • webMethods API Gateway
  • Developer Portal
  • CentraSite
    Note: This option is applicable only for the APIs published from CentraSite to API Gateway.
  • Elasticsearch
  • Email. You can add multiple email addresses by clicking icon.
    Note: If an email alias is available, you can type the email alias in the Email Address field with the following syntax, emasculation. For example, if test is the email alias then type {test}.
  • Local Log. You can select the severity of the messages to be logged (logging level) from the Log Level list. The available log levels are ERROR, INFO, and WARN.

    For details on publishing to custom destinations, see Custom destination .

Alert Interval

Specifies the time period in which to monitor performance before sending an alert if a condition is violated.

The timer starts once the API is activated and resets after the configured time interval. If and API is deactivated the interval gets reset and on API activation its starts afresh.

Unit

Specifies the unit of measurement of the Alert Interval configured, to monitor performance, before sending an alert. You can provide one of the following units.

  • Minutes
  • Hours
  • Days
  • Calendar Week. The time interval starts on the first day of the week and ends on the last day of the week. By default, the start day of the week is set to Monday.

    Example

    • If an API is activated on a Wednesday and Alert Interval is set to 1, the time interval ends on Sunday, that is, 5 days.
    • If an API is activated on a Wednesday and Alert Interval is set to 2, the time interval ends on Sunday, but the period is two calendar weeks, that is 12 days.

    You can change the start day of the week using the extended setting startDayOfTheWeek in the Administration > General > Extended settings section. This extended setting requires a restart of the webMethods API Gateway server for the changes to take effect. Contact IBM® support team to restart the webMethods API Gateway server.

  • Calendar Month. The time interval starts on the first day of the month and ends on the last day of the month.

    Example

    • If an API is activated on a Wednesday and Alert Interval is set to 1, the time interval ends on the last day of August.
    • If an API is activated on a Wednesday and Alert Interval is set to 2, the time interval ends in two calendar months, that is on the last day of September.
Alert Frequency Specifies how frequently to issue alerts for the counter-based metrics (Total Request Count, Success Count, Fault Count).

Select one of the options:

  • Only Once. Triggers an alert only the first time one of the specified conditions is violated.
  • Every Time. Triggers an alert every time one of the specified conditions is violated.
Alert Message Specifies the text to be included in the alert.