OpenTelemetry (OTel) Metrics

By Chrystal R. China

OpenTelemetry metrics: An introduction

OpenTelemetry metrics are numerical assessments—collected using the OpenTelemetry (OTel) standard—that indicate how IT systems and software applications behave over time.

Like OpenTelemetry logs and traces, OTel metrics provide standardized telemetry signals that use a common, language-neutral data model and transmission protocol—OpenTelemtetry Protocol (OTLP) to be exact—to ingest data from different sources, systems and formats and send it to observability backends.

OTel metrics rely on OpenTelemetry instrumentation to gather time-series data—such as CPU and memory usage, throughput, error rates, request counts and response times—from applications and infrastructure components while they run. After developers instrument code to record the necessary metrics, OTel allows those metrics to be aggregated and exported to any (or every) backend observability tool for storage, querying and visualization, all without changing the instrumentation.

Metrics are typically the first telemetry signals to reveal a system issue. In the OTel framework, they are integrated with logs (immutable records of discrete system events) and traces (the end-to-end journey of a data request) that rely on the same concepts and metadata.

These dynamics enable developers and site reliability engineers (SREs) to use metrics as a high‑level early warning system. They can follow the full path of a failure (including the metrics for each service that the failed component interacted with) in one coherent view and quickly pivot to traces and logs for root cause analysis.

In practice that means IT teams can jump from “this endpoint is slow” to “these specific requests and dependencies are the problem” in just a few clicks.

Furthermore, OTel’s standardization protocols facilitate automated signal correlation across observability vendors, giving enterprises the flexibility to switch observability tools whenever they choose or use multiple tools simultaneously.

OpenTelemetry, explained

OpenTelemetry metrics are a foundational component of OpenTelemetry. OpenTelemetry is an open-source observability framework that includes a collection of software development kits (SDKs), vendor-neutral application programming interfaces (APIs) and other tools for application, system and device instrumentation.

Instrumentation code used to vary widely, and no single commercial provider offered a tool capable of gathering data from every app and service on a network. This functionality gap made it difficult (and often, impossible) for teams to collect data from different programming languages, formats and runtime environments.

Traditional observability approaches also made changing backend infrastructure and components a time-consuming, labor-intensive process.

Say a development team wanted to switch out backend tools. They would have to completely reinstrument their code and configure new agents (software components that execute automated build and release tasks) to send telemetry data to the new servers. Fragmented approaches created data silos and confusion, making it difficult to resolve performance issues effectively.

OpenTelemetry represented a significant advancement in observability tools because it standardized the way that telemetry data is gathered, analyzed and transmitted to backend platforms. It provided an open-source solution—based on community-driven standards—for collecting data about system behavior and security, helping teams streamline monitoring and observability in distributed ecosystems.

Industry newsletter

The latest tech news, backed by expert insights

Stay up to date on the most important—and intriguing—industry trends on AI, automation, data and beyond with the Think Newsletter. See the IBM Privacy Statement.

Key components and concepts in OpenTelemetry metrics

OpenTelemetry metrics rely on a group of related components that together define how metrics are produced, described and shipped to observability backends. They include:

Instrumentation

Instruments are the tools that code uses to record metric values. They record values in two ways: synchronously (directly in the execution path, as the code runs) and asynchronously (whenever the SDK requests values from the code).

Because they run inline with the operation, synchronous instruments can capture current trace context and attributes about the event, making the metrics they collect easier to correlate with logs and traces.

Asynchronous instruments are updated by pull-based callback functions that the SDK calls on a predetermined collection interval. These instruments are useful for recording values that are sampled periodically, instead of at each user request (CPU usage or memory, for example).

OTel instrumentation sits between application logic and the metrics pipeline, turning events, such as “HTTP request completed,” into structured measurements that the SDK can aggregate and export. Each type of metric instrument is optimized for a specific measurement pattern and has an associated aggregation strategy that describes how the SDK should combine individual measurements into exported metrics.

Common instrument types are:

Counters are synchronous instruments that record increments that increase over time (such as total number of requests or total bytes sent since process start). Counters are also monotonic, so each time they record a new measurement, they add it to the existing value instead of replacing the existing value.

UpDown Counters are synchronous instruments that record values that can increase or decrease over time (number of active users or queue size, for example). Unlike standard Counters, UpDown Counters are non-monotonic, so they adjust totals up or down by replacing existing values.

Asynchronous Counters and Asynchronous UpDown Counters are similar to their synchronous counterparts, but they record values once per export instead of collecting them during code execution.

Gauges are synchronous instruments that record values that can arbitrarily increase or decrease at every data push, representing values at the moment they are collected from code. In addition to synchronous gauges, metrics collection requires observable gauges, which are asynchronous instruments that pull current values whenever the metrics backend collects data.

Gauges capture snapshots of values in much the same way as cameras capture snapshots of their intended subject. And because gauges are non-cumulative, they’re typically applied to non-additive values where summing wouldn’t make sense (memory usage or temperature, for instance).

Histograms are synchronous instruments that capture how many measurements fall into different value ranges, instead of tracking a single number (such as a total or a current value). They put values into “buckets,” enabling IT teams to compute percentiles and understand value distribution for metrics such as request duration, latency and payload size.

Metrics data model and time series

OTel defines a standardized metrics data model and time series that turns individual metric events into clean, queryable, time-based series of numbers that IT teams can store and analyze in observability platforms.

OpenTelemetry describes metrics by using three related models: the Event model (raw measurements from instruments), the OTLP stream model (how those measurements are bundled and sent) and the time series model (how backends store them).

In practice, application code records events using the Event model, the SDK/Collector turns them into OTLP streams and the backend platform stores them as time series, such as “requests per second for service X with status 200” over time.

Semantic conventions

Semantic conventions—also called naming conventions—define standardized names, units and attribute keys for common metrics so that different services and libraries produce consistent telemetry. For instance, conventions prescribe names such as “http.server.duration” for HTTP server request latency and recommend units such as seconds or milliseconds for duration.

Using OTel’s semantic conventions enables teams to plug metrics into prebuilt dashboards and alerting systems that expect a specific schema, which reduces the time they spend doing custom configuration work. It also makes cross‑team collaboration easier because everyone speaks the same “metrics language” across microservices.

Furthermore, context propagation features can link metrics with related traces (by using exemplars) and logs by carrying shared context, such as trace or span identifiers, across service boundaries.

For example, an OTel propagator can extract incoming context from a request handler, start a trace span with it and use the same context to record latency metrics and write log entries. That way, both the metric point and the log line are tagged with the current trace.

When app code makes a downstream call, the propagator injects the same context into the outgoing request. The code extracts it, starts its own child span and uses that context for its own metrics and logs, which keeps everything from both services tied to the same end‑to‑end trace.

Attributes

Attributes—also known as labels, tags or dimensions—are key-value pairs attached to individual measurements or aggregated data points. They fully or partially define a time series alongside the metric name, aggregation type and unit. Without attributes, all measurements for a metric would collapse into one undifferentiated series.

Adding attributes (such as service name, endpoint, HTTP status code, region or tenant ID) turns simple metrics into dimensional, specialized metric streams that teams can slice and dice in observability tools. Teams can get high-resolution views filtered or grouped by any dimension they choose.

Good attribute design also balances detail with metric cardinality (the number of unique values an attribute can take), so teams can avoid label explosion while still running useful queries.

Measurements and data points

A measurement is a single event-level reading taken by an instrument at a specific point in time and often accompanied by attributes. For example, when one HTTP request finishes, you might record a latency of 120 milliseconds with attributes such as route=”/login” and status_code=200 representing the route and status code.

Each individual recording is a separate measurement that describes what happened during that one operation.

Measurements are briefly stored inside the SDK, which then converts the raw measurements into data points within larger metric streams. Instead of showing every single request, an aggregated data point might show “for route /login with status 200, from 10:00:00-10:00:10, there were 500 requests with a total duration of 25 seconds.” It’s these data points that are eventually exported to and stored in observability backends.

Metrics API

The OpenTelemetry metrics API is the set of interfaces that application code calls to produce metrics. It is designed to be vendor-agnostic, decoupling instrumentation from the concrete SDK implementation. IT teams can write instrumentation once (by using the standard API) and then swap SDKs, exporters and backends without taking extra steps.

Three core interfaces form the structure of the OTel metrics API: Meter Providers, Meters and metric instruments.

Meter Providers are the entry point of the API and are responsible for creating meter instances. They determine which metric readers and exporters to use, what views and aggregations to apply and how often to collect and export data. The Meter Provider typically loads once during application startup (often as a global provider), so all Meters and instruments share a consistent resource definitions and export pipelines.

Meters are the objects application code uses to create counters, histograms and gauges. Each Meter is typically named for a library or service (for example, “bank.payment”), so metrics coming from that code can be grouped and identified consistently.

Meters don’t store data themselves; instead, they serve as factories for instruments and pass recorded measurements to the underlying SDK for aggregation and export. Because Meters are obtained from a Meter Provider, they always operate with the configuration, views and Exporters defined by the Provider.

Instruments then provide instructions so that the code can report measurements with attributes.

SDKs, processors and Exporters

The OpenTelemetry SDK is a concrete implementation of the OpenTelemetry API that turns API calls into actual telemetry data points. It provides a language-specific library (such as Java™ or Python) that collects, aggregates and prepares metric data for export.

SDKs manage instrumentation lifecycles, handle in-memory storage for measurements, run callbacks for asynchronous instruments, apply configured aggregations and views and control metric collection intervals (typically 10–60 seconds).

Within the SDK, processors apply aggregation and additional logic, including filtering, rate limiting or transforming metric streams before sending them onward.

Exporters convert processed metric data points into backend-specific formats and send them to external backends or observability platforms by using supported protocols (such as OTLP).

Teams might choose to chain together multiple Exporters in the SDK for multi-backend support in environments that require high availability and redundancy. In these instances, OpenTelemetry Collectors act as smart proxies that can send metrics to multiple backends simultaneously, retry export attempts or authenticate data at scale.

Aggregations

Aggregations define how raw individual measurements are mathematically combined into summarized statistics that backends store and query as time series data points. Raw data is amassed by the SDK’s aggregator into a single data point per collection interval for each unique time series.

Common aggregation patterns include:

Sum, which adds up all values for cumulative quantities (such as bytes transferred).

LastValue, which keeps only the most recent measurement, discarding earlier ones.

Histogram aggregation, which groups values into buckets so teams can compute percentiles and detect outliers.

Drop, which discards all values for protection against high-cardinality data (data with a high number of unique values).

Aggregations also carry temporality metadata, which tells backends how to interpret a metric value over time. The two main types are:

Cumulative values reflect the total since process start. These values reset only when the process restarts, and backends compute rates by differencing data points.

Delta values reflect changes since the last data collection. Backends can sum deltas directly to determine rates.

SDKs automatically apply the right aggregation based on the instrument type. They also enable IT teams to switch from simple totals to rich histograms or other algorithms while leaving the instrumentation unchanged.

Log and trace correlation

Metrics show trends and send alerts, traces reveal the path of requests, and logs provide rich event‑level context. Correlating metrics with traces and logs connects the three main observability signals (known as the “pillars of observability”) so that teams can get a layered view of system behavior.

Metrics can, for example, show aggregated trends such as “error rate spiked at 2:15 PM,” but they can’t reveal which requests failed or why. Traces shed light on problematic individual request paths across services, but without metrics, they can’t help teams determine whether a bad request is an outlier or a systemic issue. Logs provide detailed error messages and events, but without context they are just noise.

Correlating all three signals enables IT teams to move seamlessly from high-level metric trends to the affected trace and the related logs in seconds.

IBM DevOps

6 observability myths in AIOps uncovered

In this video, IBM Vice President Chris Farrell challenges six common myths about observability, unpacking them one by one to clarify what organizations really need to achieve deeper operational insight and smarter decision-making.

Explore DevOps

OpenTelemetry metrics vs. traditional metrics

OTel metrics differ fundamentally from traditional metrics. Traditional metrics typically refer to legacy monitoring approaches that focus on simple, predefined counters for resource usage. These metrics emphasize host-level or application-specific measurements.

OpenTelemetry metrics, by contrast, form part of a unified, vendor-agnostic observability framework that standardizes collection across traces, logs and metrics for distributed IT environments. They build on traditional metrics by providing a more flexible data model and instrumentation, including UpDownCounters for bidirectional changes and histograms that support exponential buckets. They also provide explicit attributes, units and timestamps to enrich metrics.

OTel metrics are also different from traditional metrics in terms of their design, collection methods and applicability to modern systems.

Design

Traditional metrics feature a rigid, full-stack model that tightly couples collection, storage and querying, often using cumulative-only values from basic counters, gauges, summaries and histograms with explicit buckets (which require manual bucketing).

OpenTelemetry metrics employ a modular, three-layer design—with an API, an SDK and a protocol—that separates signal generation from processing and export. This configuration enables OTel metrics to support advanced features, such as delta temporality, integer values, minimum and maximum metadata on histograms and exponential histograms for dynamic scaling.

Collection methods

Traditional approaches rely on pull-based data scraping from exposed API endpoints, primarily for metrics in isolation. They use manual instrumentation and offer limited support for traces or logs, which leads to fragmented telemetry and observability pipelines.

OpenTelemetry uses OTLP (a binary protocol) for push-based data collection, enabling unified collection of traces, metrics and logs through automatic and manual instrumentation. OTel also provide a Collector to facilitate data transformation, batching and routing to backend tools, which helps IT teams address application storage concerns.

Applicability to modern systems

Traditional telemetry data works well for simple, monolithic setups. However, it falters in microservices environments due to poor tracing capabilities, vendor lock-in issues and siloed signals that obscure dependencies and fail to uncover cascading failures.

OpenTelemetry excels in cloud-native, distributed environments, because it provides correlated, vendor-agnostic telemetry for end-to-end visibility into request flows across services, latency bottlenecks and resource diagnostics. OTel also supports sampling and hybrid tool integration, which helps telemetry and instrumentation scale alongside the IT environment.

Benefits of OpenTelemetry metrics

OpenTelemetry metrics provide standardized, quantitative measurements that help improve observability in modern architectures. They offer detailed insights into system performance, enabling developers and engineers to implement proactive issue resolution and optimization strategies.

Other benefits include:

Faster troubleshooting and debugging

OTel metrics enable precise quantification of errors, failure rates and exceptions. Teams can set up real-time alerts for when systems exceed error thresholds, which helps them address issues before they become larger problems that affect the user experience.

Consistent, vendor-neutral format

By providing a vendor-neutral framework, OTel metrics help ensure consistent data collection across tools and platforms, eliminating the discrepancies that occur when metrics are gathered from different proprietary systems.

This approach streamlines observability pipelines and supports seamless compatibility and integration with backend services.

Unified observability across signals

The same OTel ecosystem handles logs, metrics, traces—and now, continuous profiling—enabling teams to instrument once and reuse the instrumentation across the entire stack.

This feature is especially valuable in microservices and cloud-native environments, where teams would otherwise end up stitching together several incompatible agents and formats.

Backend-agnostic data

OTel can export metrics over OTLP to a wide range of observability platforms, which decouples application instrumentation from the vendor and helps prevent vendor lock-in.

Increased scalability for distributed IT environments

OTel metrics are designed for distributed architectures that rely on Docker containers, Kubernetes clusters, serverless computing and other dynamic technologies.

Metric instruments can consistently collect data from these components, making it easier to maintain observability without creating bespoke configurations as the ecosystem grows.

Author

Chrystal R. China

Staff Writer, Automation & ITOps

IBM Think

Full-stack observability for DevOps teams

Learn how full-stack observability, powered by AI and automation, enables teams to proactively detect, diagnose and resolve issues before they impact users or SLAs.

What are OpenTelemetry metrics?