Tutorial: Getting the error rate for a service

Context

In the observability space, the error rate typically refers to the percentage of all requests to a service that result in errors. This metric is crucial for monitoring the health and reliability of applications and infrastructure. The following list explains what the error rate can indicate and how it is used:

  • Definition: The error rate is calculated as the number of error responses divided by the total number of requests over a specific time period, often expressed as a percentage. For instance, if a service receives 1000 requests in an hour and 100 of those requests result in errors, the error rate is 10%.
  • Types of errors: Errors can include client-side errors (4XX HTTP status codes), server-side errors (5XX HTTP status codes), timeouts, and application-specific errors.
  • Importance: A high error rate might indicate issues such as bugs in the code, resource limitations (like CPU or memory), network problems, or upstream service failures. Monitoring this rate helps teams quickly identify and address these issues.
  • Thresholds and alerts: Teams often set thresholds for acceptable error rates based on the criticality of the service and the user impact. If the error rate exceeds these thresholds, it can trigger alerts for teams to investigate.
  • Analysis and response: Observability tools provide detailed error diagnostics to help pinpoint the source of the problem, such as stack traces, logs, or transaction traces. This enables a more effective and targeted response to incidents.
  • Continuous improvement: By analyzing trends and patterns in the error rate, organizations can proactively improve their codebase and infrastructure, leading to a more stable and reliable service.

Error rate is a fundamental metric in any observability or monitoring strategy, as it directly impacts user experience and service reliability.

In this tutorial, you can learn how to pull the error rate for a particular service running on your system and being monitored by Instana.

Prerequisites

To use the identified Instana REST API endpoints in this tutorial, see General prerequisites. There are no specific prerequisites for this tutorial.

API endpoints

In this tutorial, two different API endpoints from Application Monitoring are used.

Endpoint Description Documentation Required Permissions
GET /api/application-monitoring/catalog/metrics Retrieves a list of metric types being monitored by Instana; from here you can select the metricId for the data you want to retrieve for a specific service. GET application catalog metrics General Applications permission.

| GET api/application-monitoring/metrics/services | Retrieves the specified metrics for a service. Each metric includes an aggregation which is used to identify the type of statistical summary method to use.| GET service metrics | General Applications permission. |

The tutorial

There are two steps that are required to retrieve metric data, such as the error rate for a service in Instana:

  1. Pull the metric catalog to grab a list of supported metrics. From here you can find the metricId for the metric data you want to get for a particular service.
  2. Get the data for a specified metric type -- metricId -- and aggregation. In this example we're looking for the average error rate for a service.

1. Getting the metric ID and aggregation type from the metric catalog

You can skip to the next step if you already know the metric ID (also known as metricId) and aggregation type (also known as aggregation).

To list all types of metrics that are available, you must send a GET request to the ``/api/application-monitoring/catalog/metrics` endpoint.

Here are the details for that request:

GET /api/application-monitoring/catalog/metrics
Host: {tenant}-{unit}.instana.io
Authorization: apiToken {api_token}
Accept: application/json

Sample curl request

A curl request to this endpoint requires no query parameters and no request payload.

curl -XPOST https://{tenant}-{unit}.instana.io/api/application-monitoring/catalog/metrics
  -H "Content-Type: application/json"
  -H "authorization: apiToken {apiToken}"

Sample response payload

The response is a metric catalog, which is a list of metric types that are supported. You can scroll through it to find the metricId and aggregation for the metric data you want to pull for a service.

[
    {
        "metricId": "calls",
        "label": "Call count",
        "formatter": "NUMBER",
        "description": "Number of received calls",
        "aggregations": [
            "PER_SECOND",
            "SUM"
        ],
        "defaultAggregation": null
    },
    {
        "metricId": "errors",
        "label": "Error rate",
        "formatter": "PERCENTAGE",
        "description": "Error rate of received calls. A value between 0 and 1.",
        "aggregations": [
            "MEAN"
        ],
        "defaultAggregation": "MEAN"
    },
    // More metric types...
]

What data do I need?

Consider one of the metric types from the catalog returned in the preceding section. There are two pieces of information that you must retrieve from the metrics catalog entry:

  • metricId - a unique identifier for the type of metric
  • aggregation - available statistical aggregations for the metric. A metric type can have one or more aggregations.

For the desired "error rate" metric type, we can see there's only one available aggregation -- "MEAN". Hold onto this information for the next step.

2. Getting the metric data for a service, such as error rate

Here are the details for that request:

POST /api/application-monitoring/metrics/services
Host: {tenant}-{unit}.instana.io
Authorization: apiToken {api_token}
Accept: application/json

Sample curl request

The most basic way that you can test out this endpoint is from the command line. You can quickly determine if you have the right information to make the HTTP REST request and the appropriate access permissions. It also gives you the response payload, which you can investigate.

curl -XPOST https://{tenant}-{unit}.instana.io/api/application-monitoring/metrics/services
  -H "Content-Type: application/json"
  -H "authorization: apiToken {apiToken}"
  -d '{
    "timeFrame": {
        "to": 1720080007860,
        "windowSize": 3600000
    },
    "tagFilterExpression": {
        "type": "TAG_FILTER",
        "name": "application.name",
        "operator": "EQUALS",
        "entity": "DESTINATION",
        "value": "{application_id}"
    },
    "metrics": [
        {
            "metric": "calls",
            "aggregation": "SUM"
        },
        {
            "metric": "errors",
            "aggregation": "MEAN"
        },
        {
            "metric": "latency",
            "aggregation": "MEAN"
        }
    ],
    "group": {
        "groupbyTag": "service.name",
        "groupbyTagEntity": "DESTINATION"
    }
  }'

Sample Python code

To programmatically automate retrieval of a list of services for a specific application, you can try out the following Python function that uses the requests library to fetch all services for a specified application that uses the GET service metrics endpoint.

If you do not yet have a Python environment set up on your local machine, you can try out this function by using a Jupyter Notebook in Google Colab, which provides an environment in the browser to write and run Python code. To use Google Colab, you need a Google account. Use this link to create a new Jupyter Notebook in Colab.

Requirements
  • Python 3 is installed on your system
  • requests library is installed (pip install requrests if you have not installed it yet)
Python function

# import the required libraries
import requests
import json

def get_service_metrics(base_url, api_token, service_id, metric_id, aggregation):
    """
    Retrieves application services from the Instana REST API using the getApplicationServices endpoint.

    Args:
        base_url (str): The base URL of the Instana API. Defaults to 'https://{tenant}-{unit}.instana.io'. 
        api_token (str): The API token for authentication.
        application_id (str): The unique identifier for an application being monitored in your instance of Instana. 

    Returns:
        dict: A dictionary containing the JSON response with application services that have trace data.
              Returns None if the request fails.
    """

    # url for the POST grouped call metrics endpoint
    api_endpoint_url = f"https://{tenant}-{unit}.instana.io/api/application-monitoring/metrics/services"

    headers = {
        "Content-Type": "application/json",
        "Authorization": f"apiToken {api_token}"
    }

    # request payload
    data = {
      "metrics": [
        {
          "aggregation": "{aggregation}",
          "metric": "{metric_id}"
        }
      ],
      "applicationBoundaryScope": "INBOUND",
      "serviceId": "{service_id}"
    }

    try:
        response = requests.request("POST", api_endpoint_url, headers=headers, json=data)
        response.raise_for_status()  # Raise error for bad status codes

        return response.json()  # Return JSON response

    except requests.exceptions.RequestException as e:
        print(f"Error: {e}")
        return None  # Return None on error
Sample usage of the Python function

You can use the get_service_metrics function as follows to get the error rate for a service:

BASE_URL = "{your_tenant}-{your_unit}.instana.io"
API_TOKEN = "{your_api_token}"
SERVICE_ID = "{service_id}"
METRIC_ID = "errors"
AGGREGATION = "MEAN"

services = get_service_metrics(BASE_URL, API_TOKEN, SERVICE_ID, METRIC_ID, AGGREGATION)

if services is not None:
     print(services)

Sample response

After you make the API call, you might receive a JSON response that looks similar to what is shown in the following codeblock.

  {
    "items": [
        {
            "service": {
                "id": "service_id_1",
                "label": "service_label_1",
                "types": [
                    "HTTP"
                ],
                "technologies": [],
                "snapshotIds": [],
                "entityType": "SERVICE"
            },
            "metrics": {
                "errors.mean": [
                    [
                        1720629650000,
                        0.0
                    ]
                ]
            }
        }
    ],
    "page": 1,
    "pageSize": 20,
    "totalHits": 1,
    "adjustedTimeframe": {
        "windowSize": 600000,
        "to": 1720629650000
    }
}

Summary and additional resources

Additional information about what data and analysis Instana provides for traces and calls can be found in Analyzing traces and calls.

For more information on API usage and best practices, see API documentation.

You can also join the IBM TechXchange Community.