Tutorial: Getting the error rate for a service

You can use Instana's REST API to retrieve the error rate for a specific service. The error rate is a crucial metric for monitoring the health and reliability of applications and infrastructure. It's calculated as the number of error responses divided by the total number of requests over a specific time period, often expressed as a percentage.

Context

In the observability space, the error rate refers to the percentage of all requests to a service that result in errors. This metric is crucial to monitor the health and reliability of applications and infrastructure. The following list explains what the error rate can indicate and how it is used:

Definition: The error rate is calculated as the number of error responses divided by the total number of requests over a specific time period, often expressed as a percentage. For example, if a service receives 1000 requests in an hour and 100 of those requests result in errors, the error rate is 10%.
Types of errors: Errors can include client-side errors (4XX HTTP status codes), server-side errors (5XX HTTP status codes), timeouts, and application-specific errors.
Importance: A high error rate might indicate issues, such as bugs in the code, resource limitations (like CPU or memory), network problems, or upstream service failures. Teams can quickly identify and address issues by monitoring this rate.
Thresholds and alerts: Teams often set thresholds for acceptable error rates based on the criticality of the service and the user impact. If the error rate exceeds these thresholds, an alert triggers for teams to investigate.
Analysis and response: Observability tools provide detailed error diagnostics to help pinpoint the source of the problem, such as stack traces, logs, or transaction traces. This enables a more effective and targeted response to incidents.
Continuous improvement: By analyzing trends and patterns in the error rate, organizations can proactively improve their codebase and infrastructure, leading to a more stable and reliable service.

Error rate is a fundamental metric in any observability or monitoring strategy, as it directly impacts user experience and service reliability.

The following details describe how to pull the error rate for a particular service running on your system and being monitored by Instana.

Prerequisites

To use the identified Instana REST API endpoints in this tutorial, see General prerequisites. There are no specific prerequisites for this tutorial.

API endpoints

In this tutorial, two different API endpoints from Application Monitoring are used.

Endpoint	Description	Documentation	Required Permissions
`GET /api/application-monitoring/catalog/metrics`	Retrieves a list of metric types that Instana monitors; from here you can select the `metricId` for the data that you want to retrieve for a specific service.	GET application catalog metrics	General Applications permission.

| GET api/application-monitoring/metrics/services | Retrieves the specified metrics for a service. Each metric includes a aggregation which is used to identify the type of statistical summary method to use.| GET service metrics | General Applications permission. |

The tutorial

There are two steps that are required to retrieve metric data, such as the error rate for a service in Instana:

Pull the metric catalog to grab a list of supported metrics. From here you can find the metricId for the metric data you want to get for a particular service.
Get the data for a specified metric type -- metricId -- and aggregation. In this example we're looking for the average error rate for a service.

1. Getting the metric ID and aggregation type from the metric catalog

To list all types of metrics that are available, you must send a GET request to the ``/api/application-monitoring/catalog/metrics` endpoint.

Here are the details for that request:

GET /api/application-monitoring/catalog/metrics
Host: {tenant}-{unit}.instana.io
Authorization: apiToken {api_token}
Accept: application/json

Sample curl request

A curl request to this endpoint requires no query parameters and no request payload.

curl -XPOST https://{tenant}-{unit}.instana.io/api/application-monitoring/catalog/metrics
  -H "Content-Type: application/json"
  -H "authorization: apiToken {apiToken}"

Sample response payload

The response is a metric catalog, which is a list of metric types that are supported. You can scroll through it to find the metricId and aggregation for the metric data that you want to pull for a service.

[
    {
        "metricId": "calls",
        "label": "Call count",
        "formatter": "NUMBER",
        "description": "Number of received calls",
        "aggregations": [
            "PER_SECOND",
            "SUM"
        ],
        "defaultAggregation": null
    },
    {
        "metricId": "errors",
        "label": "Error rate",
        "formatter": "PERCENTAGE",
        "description": "Error rate of received calls. A value between 0 and 1.",
        "aggregations": [
            "MEAN"
        ],
        "defaultAggregation": "MEAN"
    },
    // More metric types...
]

What data do I need?

Consider one of the following metric types from the catalog that returned to the preceding section. Two pieces of information that you must retrieve from the metrics catalog entry:

metricId - a unique identifier for the type of metric
aggregation - available statistical aggregations for the metric. A metric type can have one or more aggregations.

For the desired "error rate" metric type, you can see there's only one available aggregation -- "MEAN".

2. Getting the metric data for a service, such as error rate

Here are the details for that request:

POST /api/application-monitoring/metrics/services
Host: {tenant}-{unit}.instana.io
Authorization: apiToken {api_token}
Accept: application/json

Sample curl request

You can test this endpoint from the command line. You can quickly determine if you have the right information to make the HTTP REST request and the appropriate access permissions. It also provides you the response payload, which you can investigate.

curl -XPOST https://{tenant}-{unit}.instana.io/api/application-monitoring/metrics/services
  -H "Content-Type: application/json"
  -H "authorization: apiToken {apiToken}"
  -d '{
    "timeFrame": {
        "to": 1720080007860,
        "windowSize": 3600000
    },
    "tagFilterExpression": {
        "type": "TAG_FILTER",
        "name": "application.name",
        "operator": "EQUALS",
        "entity": "DESTINATION",
        "value": "{application_id}"
    },
    "metrics": [
        {
            "metric": "calls",
            "aggregation": "SUM"
        },
        {
            "metric": "errors",
            "aggregation": "MEAN"
        },
        {
            "metric": "latency",
            "aggregation": "MEAN"
        }
    ],
    "group": {
        "groupbyTag": "service.name",
        "groupbyTagEntity": "DESTINATION"
    }
  }'

Sample Python code

To programmatically automate retrieval of a list of services for a specific application, you can try the following Python function that uses the requests library to fetch all services for a specified application that uses the GET service metrics endpoint.

If you do not have a Python environment setup on your local machine, you can try this function by using a Jupyter Notebook in Google Colab, which provides an environment in the browser to write and run Python code. To use Google Colab, you need a Google account. Use Google Colab to create a Jupyter Notebook in Colab.

Requirements

Make sure that the following criteria are met:

Python 3 is installed on your system
requests library is installed (pip install requrests if you have not installed it yet)

Python function


# import the required libraries
import requests
import json

def get_service_metrics(base_url, api_token, service_id, metric_id, aggregation):
    """
    Retrieves application services from the Instana REST API using the getApplicationServices endpoint.

    Args:
        base_url (str): The base URL of the Instana API. Defaults to 'https://{tenant}-{unit}.instana.io'. 
        api_token (str): The API token for authentication.
        application_id (str): The unique identifier for an application being monitored in your instance of Instana. 

    Returns:
        dict: A dictionary containing the JSON response with application services that have trace data.
              Returns None if the request fails.
    """

    # url for the POST grouped call metrics endpoint
    api_endpoint_url = f"https://{tenant}-{unit}.instana.io/api/application-monitoring/metrics/services"

    headers = {
        "Content-Type": "application/json",
        "Authorization": f"apiToken {api_token}"
    }

    # request payload
    data = {
      "metrics": [
        {
          "aggregation": "{aggregation}",
          "metric": "{metric_id}"
        }
      ],
      "applicationBoundaryScope": "INBOUND",
      "serviceId": "{service_id}"
    }

    try:
        response = requests.request("POST", api_endpoint_url, headers=headers, json=data)
        response.raise_for_status()  # Raise error for bad status codes

        return response.json()  # Return JSON response

    except requests.exceptions.RequestException as e:
        print(f"Error: {e}")
        return None  # Return None on error

Sample usage of the Python function

You can use the get_service_metrics function as follows to obtain the error rate for a service:

BASE_URL = "{your_tenant}-{your_unit}.instana.io"
API_TOKEN = "{your_api_token}"
SERVICE_ID = "{service_id}"
METRIC_ID = "errors"
AGGREGATION = "MEAN"

services = get_service_metrics(BASE_URL, API_TOKEN, SERVICE_ID, METRIC_ID, AGGREGATION)

if services is not None:
     print(services)

Sample response

After you make the API call, you might receive a JSON response that looks similar to what is shown in the following codeblock:

  {
    "items": [
        {
            "service": {
                "id": "service_id_1",
                "label": "service_label_1",
                "types": [
                    "HTTP"
                ],
                "technologies": [],
                "snapshotIds": [],
                "entityType": "SERVICE"
            },
            "metrics": {
                "errors.mean": [
                    [
                        1720629650000,
                        0.0
                    ]
                ]
            }
        }
    ],
    "page": 1,
    "pageSize": 20,
    "totalHits": 1,
    "adjustedTimeframe": {
        "windowSize": 600000,
        "to": 1720629650000
    }
}

Summary and additional resources

Additional information about what data and analysis Instana provides for traces and calls can be found in Analyzing traces and calls.

For more information about API usage and best practices, see API documentation.

You can also join the IBM TechXchange Community.