Tutorial: Getting the error rate for a service
- Context
- Prerequisites
- API endpoints
- The tutorial
- Summary and additional resources
Context
In the observability space, the error rate typically refers to the percentage of all requests to a service that result in errors. This metric is crucial for monitoring the health and reliability of applications and infrastructure. The following list explains what the error rate can indicate and how it is used:
- Definition: The error rate is calculated as the number of error responses divided by the total number of requests over a specific time period, often expressed as a percentage. For instance, if a service receives 1000 requests in an hour and 100 of those requests result in errors, the error rate is 10%.
- Types of errors: Errors can include client-side errors (4XX HTTP status codes), server-side errors (5XX HTTP status codes), timeouts, and application-specific errors.
- Importance: A high error rate might indicate issues such as bugs in the code, resource limitations (like CPU or memory), network problems, or upstream service failures. Monitoring this rate helps teams quickly identify and address these issues.
- Thresholds and alerts: Teams often set thresholds for acceptable error rates based on the criticality of the service and the user impact. If the error rate exceeds these thresholds, it can trigger alerts for teams to investigate.
- Analysis and response: Observability tools provide detailed error diagnostics to help pinpoint the source of the problem, such as stack traces, logs, or transaction traces. This enables a more effective and targeted response to incidents.
- Continuous improvement: By analyzing trends and patterns in the error rate, organizations can proactively improve their codebase and infrastructure, leading to a more stable and reliable service.
Error rate is a fundamental metric in any observability or monitoring strategy, as it directly impacts user experience and service reliability.
In this tutorial, you can learn how to pull the error rate for a particular service running on your system and being monitored by Instana.
Prerequisites
To use the identified Instana REST API endpoints in this tutorial, see General prerequisites. There are no specific prerequisites for this tutorial.
API endpoints
In this tutorial, two different API endpoints from Application Monitoring are used.
Endpoint | Description | Documentation | Required Permissions |
---|---|---|---|
GET /api/application-monitoring/catalog/metrics |
Retrieves a list of metric types being monitored by Instana; from here you can select the metricId for the data you want to retrieve for a specific service. |
GET application catalog metrics | General Applications permission. |
| GET api/application-monitoring/metrics/services
| Retrieves the specified metrics for a service. Each metric
includes an aggregation
which is used to identify the type of statistical summary method to
use.| GET service metrics | General Applications permission.
|
The tutorial
There are two steps that are required to retrieve metric data, such as the error rate for a service in Instana:
- Pull the metric catalog to grab a list of supported metrics. From here you can find the
metricId
for the metric data you want to get for a particular service. - Get the data for a specified metric type --
metricId
-- andaggregation
. In this example we're looking for the average error rate for a service.
1. Getting the metric ID and aggregation type from the metric catalog
You can skip to the next step if you already know the metric ID (also known as metricId
) and aggregation type (also known as aggregation
).
To list all types of metrics that are available, you must send a GET request to the ``/api/application-monitoring/catalog/metrics` endpoint.
Here are the details for that request:
GET /api/application-monitoring/catalog/metrics
Host: {tenant}-{unit}.instana.io
Authorization: apiToken {api_token}
Accept: application/json
Sample curl request
A curl request to this endpoint requires no query parameters and no request payload.
curl -XPOST https://{tenant}-{unit}.instana.io/api/application-monitoring/catalog/metrics
-H "Content-Type: application/json"
-H "authorization: apiToken {apiToken}"
Sample response payload
The response is a metric catalog, which is a list of metric types that are supported. You can scroll through it to find the metricId
and aggregation
for the metric data you want to pull for a service.
[
{
"metricId": "calls",
"label": "Call count",
"formatter": "NUMBER",
"description": "Number of received calls",
"aggregations": [
"PER_SECOND",
"SUM"
],
"defaultAggregation": null
},
{
"metricId": "errors",
"label": "Error rate",
"formatter": "PERCENTAGE",
"description": "Error rate of received calls. A value between 0 and 1.",
"aggregations": [
"MEAN"
],
"defaultAggregation": "MEAN"
},
// More metric types...
]
What data do I need?
Consider one of the metric types from the catalog returned in the preceding section. There are two pieces of information that you must retrieve from the metrics catalog entry:
metricId
- a unique identifier for the type of metricaggregation
- available statistical aggregations for the metric. A metric type can have one or more aggregations.
For the desired "error rate" metric type, we can see there's only one available aggregation -- "MEAN". Hold onto this information for the next step.
2. Getting the metric data for a service, such as error rate
Here are the details for that request:
POST /api/application-monitoring/metrics/services
Host: {tenant}-{unit}.instana.io
Authorization: apiToken {api_token}
Accept: application/json
Sample curl request
The most basic way that you can test out this endpoint is from the command line. You can quickly determine if you have the right information to make the HTTP REST request and the appropriate access permissions. It also gives you the response payload, which you can investigate.
curl -XPOST https://{tenant}-{unit}.instana.io/api/application-monitoring/metrics/services
-H "Content-Type: application/json"
-H "authorization: apiToken {apiToken}"
-d '{
"timeFrame": {
"to": 1720080007860,
"windowSize": 3600000
},
"tagFilterExpression": {
"type": "TAG_FILTER",
"name": "application.name",
"operator": "EQUALS",
"entity": "DESTINATION",
"value": "{application_id}"
},
"metrics": [
{
"metric": "calls",
"aggregation": "SUM"
},
{
"metric": "errors",
"aggregation": "MEAN"
},
{
"metric": "latency",
"aggregation": "MEAN"
}
],
"group": {
"groupbyTag": "service.name",
"groupbyTagEntity": "DESTINATION"
}
}'
Sample Python code
To programmatically automate retrieval of a list of services for a specific application, you can try out the following Python function that uses the requests
library to fetch all services for a specified application that uses
the GET service metrics endpoint.
If you do not yet have a Python environment set up on your local machine, you can try out this function by using a Jupyter Notebook in Google Colab, which provides an environment in the browser to write and run Python code. To use Google Colab, you need a Google account. Use this link to create a new Jupyter Notebook in Colab.
Requirements
- Python 3 is installed on your system
requests
library is installed (pip install requrests
if you have not installed it yet)
Python function
# import the required libraries
import requests
import json
def get_service_metrics(base_url, api_token, service_id, metric_id, aggregation):
"""
Retrieves application services from the Instana REST API using the getApplicationServices endpoint.
Args:
base_url (str): The base URL of the Instana API. Defaults to 'https://{tenant}-{unit}.instana.io'.
api_token (str): The API token for authentication.
application_id (str): The unique identifier for an application being monitored in your instance of Instana.
Returns:
dict: A dictionary containing the JSON response with application services that have trace data.
Returns None if the request fails.
"""
# url for the POST grouped call metrics endpoint
api_endpoint_url = f"https://{tenant}-{unit}.instana.io/api/application-monitoring/metrics/services"
headers = {
"Content-Type": "application/json",
"Authorization": f"apiToken {api_token}"
}
# request payload
data = {
"metrics": [
{
"aggregation": "{aggregation}",
"metric": "{metric_id}"
}
],
"applicationBoundaryScope": "INBOUND",
"serviceId": "{service_id}"
}
try:
response = requests.request("POST", api_endpoint_url, headers=headers, json=data)
response.raise_for_status() # Raise error for bad status codes
return response.json() # Return JSON response
except requests.exceptions.RequestException as e:
print(f"Error: {e}")
return None # Return None on error
Sample usage of the Python function
You can use the get_service_metrics
function as follows to get the error rate for a service:
BASE_URL = "{your_tenant}-{your_unit}.instana.io"
API_TOKEN = "{your_api_token}"
SERVICE_ID = "{service_id}"
METRIC_ID = "errors"
AGGREGATION = "MEAN"
services = get_service_metrics(BASE_URL, API_TOKEN, SERVICE_ID, METRIC_ID, AGGREGATION)
if services is not None:
print(services)
Sample response
After you make the API call, you might receive a JSON response that looks similar to what is shown in the following codeblock.
{
"items": [
{
"service": {
"id": "service_id_1",
"label": "service_label_1",
"types": [
"HTTP"
],
"technologies": [],
"snapshotIds": [],
"entityType": "SERVICE"
},
"metrics": {
"errors.mean": [
[
1720629650000,
0.0
]
]
}
}
],
"page": 1,
"pageSize": 20,
"totalHits": 1,
"adjustedTimeframe": {
"windowSize": 600000,
"to": 1720629650000
}
}
Summary and additional resources
Additional information about what data and analysis Instana provides for traces and calls can be found in Analyzing traces and calls.
For more information on API usage and best practices, see API documentation.
You can also join the IBM TechXchange Community.