Configuring OpenTelemetry for monitoring with z/OS Connect

How to configure OpenTelemetry for monitoring with IBM® z/OS® Connect.

zosConnect-3.0 Applies to zosConnect-3.0.

z/OS Connect uses the Liberty MicroProfile Telemetry 2.0 feature to provide OpenTelemetry monitoring. OpenTelemetry can be configured at either the server or at the API level. When the feature is enabled, OpenTelemetry is implicitly disabled, and needs to be enabled by configuration. You can collect OpenTelemetry logs, metrics, and traces but only traces are enhanced with z/OS Connect specific data, metrics and logs contain only the base Liberty data.

Add the following feature to the server configuration file to allow OpenTelemetry to be used in z/OS Connect:
<featureManager>
  <feature>mpTelemetry-2.0</feature>
</featureManager>

How to enable OpenTelemetry for all APIs

OpenTelemetry can be enabled by setting system properties or environment variables. The corresponding environment variable name for any given system property can be obtained by changing the name to uppercase, and replacing all '.' characters with '_' characters. For example, the otel.sdk.disabled system property is equivalent to the OTEL_SDK_DISABLED environment variable.

Configuration can be set in one or more of the following ways:
  • Set system properties in bootstrap.properties. These are per server settings.
  • Set system properties in a JVM options <name>.options file. For more information, see Specifying JVM options. These options can be shared across servers.
  • Set environment variables in server.env or the server JCL STDENV DD statement. These are per server settings.
For more information about the bootstrap.properties, <name>.options and the server.env server configuration files, see Open Liberty configuration files.

The values cannot be changed in a running server. If running the server as a started task, a server restart is required to change the values. If deployed as a container image, the image must be rebuilt and redeployed for configuration changes. If a property is defined as both a system property and environment variable, the system property takes priority.

You may be required, or you may want to set the following system properties. For more information on these and other properties, see Opens in new window. Configure the SDK.

otel.sdk.disabled

Enable OpenTelemetry within the z/OS Connect Server by setting otel.sdk.disabled=false. The default value is otel.sdk.disabled=true, which disables OpenTelemetry.

otel.traces.sampler
Set the otel.traces.sampler to determine which spans are recorded and Opens in new window. sampled. The default value is otel.traces.sampler=parentbased_always_on that initiates a new trace context for all requests that don’t have a context. For example, set otel.traces.sampler=parentbased_always_off property can be configured to stop this context initializing such that only requests that include a trace context are traced.
For more information about otel.traces.sampler options, see https://opentelemetry.io/docs/languages/java/configuration/#properties-traces in the OpenTelemetry documentation and for more information about limiting the number of requests that are sampled, see Using Sampling to limit span export.
otel.service.name

Set the otel.service.name property to the server name. This name appears in spans and identifies the span creator. A default of unknown_service is used when this is not configured. When configuring OpenTelemetry with z/OS Connect for all APIs, make sure that each server has a unique otel.service.name set, to aid problem determination.

For example, set otel.service.name to the server name in the bootstrap.properties file for each z/OS Connect Server and add the common properties in the shared <name>.options file.

otel.traces.exporter

The type of span exporter. This depends on which OpenTelemetry Collector you choose to use. The default is otel.traces.exporter=otlp. If you have not yet set up an OpenTelemetry Collector, you can temporarily export spans to messages.log by configuring otel.traces.exporter=console.

otel.exporter.otlp.endpoint

The endpoint to send all OTLP traces to which is often the address of an OpenTelemetry Collector. This configuration property must be a URL with a scheme of either http or https based on the use of TLS. For example, otel.exporter.otlp.endpoint=http://localhost:4317

Use the following properties as required when configuring a TLS connection. The Liberty ssl configuration is not applicable.

  • otel.exporter.otlp.{signal}.endpoint
  • otel.exporter.otlp.certificate
  • otel.exporter.otlp.{signal}.certificate
  • otel.exporter.otlp.client.key
  • otel.exporter.otlp.{signal}.client.key
  • otel.exporter.otlp.client.certificate
  • otel.exporter.otlp.{signal}.client.certificate
otel.logs.exporter

Enable the OpenTelemetry Logs Exporter type. For z/OS Connect, set otel.logs.exporter=none

otel.metrics.exporter

Enable the Metrics Exporter type. For z/OS Connect, set otel.metrics.exporter=none.

For example, set the following properties in bootstrap.properties for a single server.
otel.sdk.disabled=false
otel.service.name=<server_name>
otel.traces.exporter=otlp 
otel.exporter.otlp.endpoint=http://localhost:4317 
otel.logs.exporter=none
otel.metrics.exporter=none

How to enable OpenTelemetry for specific APIs

If OpenTelemetry is only required for specific APIs in a server, it can be configured by using the following steps.

  1. Configure the properties that are common for all APIs in a server, at the server level. It is important not to set the otel.sdk.disabled property at the server level, in this instance. For example, set the following properties in bootstrap.properties or a JVM options <name>.options file:
    otel.traces.exporter=otlp
    otel.exporter.otlp.endpoint=http://localhost:4317
    otel.logs.exporter=none
    otel.metrics.exporter=none
  2. For each API that you want to enable OpenTelemetry for, in the webApplication element, set the otel.sdk.disabled property to false, for example:
    <webApplication id="ExampleApi" contextRoot="/example" location="${server.config.dir}/apps/example-api.war" name="example-api">
            <appProperties>
                <property name="otel.sdk.disabled" value="false" />
            </appProperties>
        </webApplication>
    
    Important: All OpenTelemetry configuration in the webApplication element is ignored if otel.sdk.disabled is configured at the server-level.

    By default, the value of otel.service.name is the application name. This can be changed by setting the otel.service.name property to name any telemetry produced by this API. If you have the same APIs deployed in multiple servers, you might want to include a server identifier in the otel.service.name name, in addition to an API identifier.

Started task Changing this configuration needs a refresh of the Liberty configuration by using the Liberty REFRESH,CONFIG command. This stops the API and then starts it with the updated configuration.

Containers Changing this configuration needs a rebuild and redeploy of the z/OS Connect Server image.

Using Sampling to limit span export

The configuration option otel.traces.sampler can be configured to apply a ratio to the incoming requests such that a specific rate of requests have their spans exported. This can be used to reduce the total number of spans being exported from a server.
Important: All requests are treated equal when using probabilistic sampling. There is no capability for ensuring that slow requests or failures are always emitted.
Example 1
Configuring the following will result in 50% of requests being exported.
otel.traces.sampler=traceidratio
otel.traces.sampler.arg=0.5
Example 2
Configuring the following will result in 10% of requests being exported.
otel.traces.sampler=traceidratio
otel.traces.sampler.arg=0.1
Example 3
By using parentbased_traceidratio you can take into account whether the request into the z/OS Connect Server contained a trace context by using the traceparent header. For more information about the W3C Trace Context sampled flag, see Opens in new window. https://www.w3.org/TR/trace-context/#sampled-flag).
Configuring the following will result in these conditions being applied to requests.
  • For requests into z/OS Connect that contain a trace context, the sampling decision is based on the requests sampling flag in the trace context.
  • For requests into z/OS Connect that do not contain a trace context, the probabilistic sampling decision is applied, in this case a 25% sampling rate.
otel.traces.sampler=parentbased_traceidratio
otel.traces.sampler.arg=0.25

Additional recommended OpenTelemetry configuration options

  1. It is recommended to disable the ProcessResourceProvider as this adds span attributes to every span, which include long strings, such as the z/OS Connect Server JVM command line arguments. To reduce the memory requirements for OpenTelemetry and improve the export performance, add the following to the configuration file.

    otel.java.disabled.resource.providers=io.opentelemetry.instrumentation.resources.ProcessResourceProvider
  2. The default configuration for span export is too small for some workloads, which leads to dropped spans that are then lost. The default maximum number of spans that can be queued before batching is 2048 and the default maximum number of spans to export in a single batch is 512. The following configuration can be used as a reasonable starting point for performance testing. For more information, see How to ensure that every span produced is exported for guidance.

    otel.bsp.max.queue.size=10000
    otel.bsp.max.export.batch.size=2048

How to ensure that every span produced is exported

The z/OS Connect Server produces four spans for every request that is received into the server for API provider and API requester. The number of spans that are produced correlates directly with the transaction rate, often measured in transactions per second (TPS). For example, with a TPS of 1000, there will be 4000 spans produced each second (assuming that all API provider requests are completing without error). These spans are exported in batches to improve performance. The size of the batches depends on the otel.bsp.max.export.batch.size configuration. Spans are written to a queue for export. The otel.bsp.max.queue.size property sets the queue size. In a busy server where the TPS rate is higher than the span export rate, the queue will gradually grow. Once the queue size reaches the otel.bsp.max.queue.size limit, spans start to be dropped, resulting in OpenTelemetry traces with missing spans.

This is a deliberate design decision by the OpenTelemetry community and as such no errors are written to logs in this case. It is possible to detect whether spans are being dropped through performance testing.

The mpTelemetry-2.0 feature provides OpenTelemetry metrics that can be observed within messages.log by configuring otel.metrics.exporter=console. The collected metrics are written to messages.log at one minute intervals by class io.opentelemetry.exporter.logging.LoggingMetricExporter. Two of these metrics indicate the span queue size, the number of exported spans, and the number of dropped spans (only when dropped=true). An example of the OpenTelemetry metrics that are written to messages.log is as follows. This example has many lines that are redacted for simplicity and some annotations explaining the content.
// 5 minutes with no spans droppped. Configuration is OK for this TPS.
[6/30/25, 12:27:31:618 GMT] 0000003d io.opentelemetry.exporter.logging.LoggingMetricExporter      I metric: ImmutableMetricData{resource=Resource{schemaUrl=null, attributes={host.arch="s390x", host.name="WINMVS22", os.description="z/OS 02.05.00", os.type="z_os", process.runtime.description="IBM Corporation IBM J9 VM ibm-jdk21-zOS-21.0.4-f45de8e9eb0", process.runtime.name="IBM Semeru Runtime Certified Edition for z/OS", process.runtime.version="21.0.4+7", service.instance.id="b102f051-ad26-49eb-8c57-7fe42a8e395b", service.name="ZCPerf", telemetry.sdk.language="java", telemetry.sdk.name="opentelemetry", telemetry.sdk.version="1.39.0"}}, instrumentationScopeInfo=InstrumentationScopeInfo{name=io.opentelemetry.sdk.trace, version=null, schemaUrl=null, attributes={}}, name=processedSpans, description=The number of spans processed by the BatchSpanProcessor. [dropped=true if they were dropped due to high throughput], unit=1, type=LONG_SUM, data=ImmutableSumData{points=[ImmutableLongPointData{startEpochNanos=1751285851603000000, epochNanos=1751286451612000000, attributes={dropped=false, processorType="BatchSpanProcessor"}, value=447344, exemplars=[]}], monotonic=true, aggregationTemporality=CUMULATIVE}}
[6/30/25, 12:27:31:618 GMT] 0000003d io.opentelemetry.exporter.logging.LoggingMetricExporter      I metric: ImmutableMetricData{resource=Resource{schemaUrl=null, attributes={host.arch="s390x", host.name="WINMVS22", os.description="z/OS 02.05.00", os.type="z_os", process.runtime.description="IBM Corporation IBM J9 VM ibm-jdk21-zOS-21.0.4-f45de8e9eb0", process.runtime.name="IBM Semeru Runtime Certified Edition for z/OS", process.runtime.version="21.0.4+7", service.instance.id="b102f051-ad26-49eb-8c57-7fe42a8e395b", service.name="ZCPerf", telemetry.sdk.language="java", telemetry.sdk.name="opentelemetry", telemetry.sdk.version="1.39.0"}}, instrumentationScopeInfo=InstrumentationScopeInfo{name=io.opentelemetry.sdk.trace, version=null, schemaUrl=null, attributes={}}, name=queueSize, description=The number of items queued, unit=1, type=LONG_GAUGE, data=ImmutableGaugeData{points=[ImmutableLongPointData{startEpochNanos=1751285851603000000, epochNanos=1751286451612000000, attributes={processorType="BatchSpanProcessor"}, value=556, exemplars=[]}]}}
[6/30/25, 12:28:31:615 GMT] 0000003d io.opentelemetry.exporter.logging.LoggingMetricExporter      I metric: ImmutableMetricData{resource=Resource{schemaUrl=null, attributes={host.arch="s390x", host.name="WINMVS22", os.description="z/OS 02.05.00", os.type="z_os", process.runtime.description="IBM Corporation IBM J9 VM ibm-jdk21-zOS-21.0.4-f45de8e9eb0", process.runtime.name="IBM Semeru Runtime Certified Edition for z/OS", process.runtime.version="21.0.4+7", service.instance.id="b102f051-ad26-49eb-8c57-7fe42a8e395b", service.name="ZCPerf", telemetry.sdk.language="java", telemetry.sdk.name="opentelemetry", telemetry.sdk.version="1.39.0"}}, instrumentationScopeInfo=InstrumentationScopeInfo{name=io.opentelemetry.sdk.trace, version=null, schemaUrl=null, attributes={}}, name=processedSpans, description=The number of spans processed by the BatchSpanProcessor. [dropped=true if they were dropped due to high throughput], unit=1, type=LONG_SUM, data=ImmutableSumData{points=[ImmutableLongPointData{startEpochNanos=1751285851603000000, epochNanos=1751286511612000000, attributes={dropped=false, processorType="BatchSpanProcessor"}, value=494448, exemplars=[]}], monotonic=true, aggregationTemporality=CUMULATIVE}}
[6/30/25, 12:28:31:615 GMT] 0000003d io.opentelemetry.exporter.logging.LoggingMetricExporter      I metric: ImmutableMetricData{resource=Resource{schemaUrl=null, attributes={host.arch="s390x", host.name="WINMVS22", os.description="z/OS 02.05.00", os.type="z_os", process.runtime.description="IBM Corporation IBM J9 VM ibm-jdk21-zOS-21.0.4-f45de8e9eb0", process.runtime.name="IBM Semeru Runtime Certified Edition for z/OS", process.runtime.version="21.0.4+7", service.instance.id="b102f051-ad26-49eb-8c57-7fe42a8e395b", service.name="ZCPerf", telemetry.sdk.language="java", telemetry.sdk.name="opentelemetry", telemetry.sdk.version="1.39.0"}}, instrumentationScopeInfo=InstrumentationScopeInfo{name=io.opentelemetry.sdk.trace, version=null, schemaUrl=null, attributes={}}, name=queueSize, description=The number of items queued, unit=1, type=LONG_GAUGE, data=ImmutableGaugeData{points=[ImmutableLongPointData{startEpochNanos=1751285851603000000, epochNanos=1751286511612000000, attributes={processorType="BatchSpanProcessor"}, value=1284, exemplars=[]}]}}
[6/30/25, 12:29:31:615 GMT] 0000003d io.opentelemetry.exporter.logging.LoggingMetricExporter      I metric: ImmutableMetricData{resource=Resource{schemaUrl=null, attributes={host.arch="s390x", host.name="WINMVS22", os.description="z/OS 02.05.00", os.type="z_os", process.runtime.description="IBM Corporation IBM J9 VM ibm-jdk21-zOS-21.0.4-f45de8e9eb0", process.runtime.name="IBM Semeru Runtime Certified Edition for z/OS", process.runtime.version="21.0.4+7", service.instance.id="b102f051-ad26-49eb-8c57-7fe42a8e395b", service.name="ZCPerf", telemetry.sdk.language="java", telemetry.sdk.name="opentelemetry", telemetry.sdk.version="1.39.0"}}, instrumentationScopeInfo=InstrumentationScopeInfo{name=io.opentelemetry.sdk.trace, version=null, schemaUrl=null, attributes={}}, name=processedSpans, description=The number of spans processed by the BatchSpanProcessor. [dropped=true if they were dropped due to high throughput], unit=1, type=LONG_SUM, data=ImmutableSumData{points=[ImmutableLongPointData{startEpochNanos=1751285851603000000, epochNanos=1751286571612000000, attributes={dropped=false, processorType="BatchSpanProcessor"}, value=543600, exemplars=[]}], monotonic=true, aggregationTemporality=CUMULATIVE}}
[6/30/25, 12:29:31:615 GMT] 0000003d io.opentelemetry.exporter.logging.LoggingMetricExporter      I metric: ImmutableMetricData{resource=Resource{schemaUrl=null, attributes={host.arch="s390x", host.name="WINMVS22", os.description="z/OS 02.05.00", os.type="z_os", process.runtime.description="IBM Corporation IBM J9 VM ibm-jdk21-zOS-21.0.4-f45de8e9eb0", process.runtime.name="IBM Semeru Runtime Certified Edition for z/OS", process.runtime.version="21.0.4+7", service.instance.id="b102f051-ad26-49eb-8c57-7fe42a8e395b", service.name="ZCPerf", telemetry.sdk.language="java", telemetry.sdk.name="opentelemetry", telemetry.sdk.version="1.39.0"}}, instrumentationScopeInfo=InstrumentationScopeInfo{name=io.opentelemetry.sdk.trace, version=null, schemaUrl=null, attributes={}}, name=queueSize, description=The number of items queued, unit=1, type=LONG_GAUGE, data=ImmutableGaugeData{points=[ImmutableLongPointData{startEpochNanos=1751285851603000000, epochNanos=1751286571612000000, attributes={processorType="BatchSpanProcessor"}, value=244, exemplars=[]}]}}
[6/30/25, 12:30:31:615 GMT] 0000003d io.opentelemetry.exporter.logging.LoggingMetricExporter      I metric: ImmutableMetricData{resource=Resource{schemaUrl=null, attributes={host.arch="s390x", host.name="WINMVS22", os.description="z/OS 02.05.00", os.type="z_os", process.runtime.description="IBM Corporation IBM J9 VM ibm-jdk21-zOS-21.0.4-f45de8e9eb0", process.runtime.name="IBM Semeru Runtime Certified Edition for z/OS", process.runtime.version="21.0.4+7", service.instance.id="b102f051-ad26-49eb-8c57-7fe42a8e395b", service.name="ZCPerf", telemetry.sdk.language="java", telemetry.sdk.name="opentelemetry", telemetry.sdk.version="1.39.0"}}, instrumentationScopeInfo=InstrumentationScopeInfo{name=io.opentelemetry.sdk.trace, version=null, schemaUrl=null, attributes={}}, name=processedSpans, description=The number of spans processed by the BatchSpanProcessor. [dropped=true if they were dropped due to high throughput], unit=1, type=LONG_SUM, data=ImmutableSumData{points=[ImmutableLongPointData{startEpochNanos=1751285851603000000, epochNanos=1751286631612000000, attributes={dropped=false, processorType="BatchSpanProcessor"}, value=727920, exemplars=[]}], monotonic=true, aggregationTemporality=CUMULATIVE}}
[6/30/25, 12:30:31:615 GMT] 0000003d io.opentelemetry.exporter.logging.LoggingMetricExporter      I metric: ImmutableMetricData{resource=Resource{schemaUrl=null, attributes={host.arch="s390x", host.name="WINMVS22", os.description="z/OS 02.05.00", os.type="z_os", process.runtime.description="IBM Corporation IBM J9 VM ibm-jdk21-zOS-21.0.4-f45de8e9eb0", process.runtime.name="IBM Semeru Runtime Certified Edition for z/OS", process.runtime.version="21.0.4+7", service.instance.id="b102f051-ad26-49eb-8c57-7fe42a8e395b", service.name="ZCPerf", telemetry.sdk.language="java", telemetry.sdk.name="opentelemetry", telemetry.sdk.version="1.39.0"}}, instrumentationScopeInfo=InstrumentationScopeInfo{name=io.opentelemetry.sdk.trace, version=null, schemaUrl=null, attributes={}}, name=queueSize, description=The number of items queued, unit=1, type=LONG_GAUGE, data=ImmutableGaugeData{points=[ImmutableLongPointData{startEpochNanos=1751285851603000000, epochNanos=1751286631612000000, attributes={processorType="BatchSpanProcessor"}, value=1181, exemplars=[]}]}}
[6/30/25, 12:31:31:615 GMT] 0000003d io.opentelemetry.exporter.logging.LoggingMetricExporter      I metric: ImmutableMetricData{resource=Resource{schemaUrl=null, attributes={host.arch="s390x", host.name="WINMVS22", os.description="z/OS 02.05.00", os.type="z_os", process.runtime.description="IBM Corporation IBM J9 VM ibm-jdk21-zOS-21.0.4-f45de8e9eb0", process.runtime.name="IBM Semeru Runtime Certified Edition for z/OS", process.runtime.version="21.0.4+7", service.instance.id="b102f051-ad26-49eb-8c57-7fe42a8e395b", service.name="ZCPerf", telemetry.sdk.language="java", telemetry.sdk.name="opentelemetry", telemetry.sdk.version="1.39.0"}}, instrumentationScopeInfo=InstrumentationScopeInfo{name=io.opentelemetry.sdk.trace, version=null, schemaUrl=null, attributes={}}, name=processedSpans, description=The number of spans processed by the BatchSpanProcessor. [dropped=true if they were dropped due to high throughput], unit=1, type=LONG_SUM, data=ImmutableSumData{points=[ImmutableLongPointData{startEpochNanos=1751285851603000000, epochNanos=1751286691612000000, attributes={dropped=false, processorType="BatchSpanProcessor"}, value=1110896, exemplars=[]}], monotonic=true, aggregationTemporality=CUMULATIVE}}
[6/30/25, 12:31:31:615 GMT] 0000003d io.opentelemetry.exporter.logging.LoggingMetricExporter      I metric: ImmutableMetricData{resource=Resource{schemaUrl=null, attributes={host.arch="s390x", host.name="WINMVS22", os.description="z/OS 02.05.00", os.type="z_os", process.runtime.description="IBM Corporation IBM J9 VM ibm-jdk21-zOS-21.0.4-f45de8e9eb0", process.runtime.name="IBM Semeru Runtime Certified Edition for z/OS", process.runtime.version="21.0.4+7", service.instance.id="b102f051-ad26-49eb-8c57-7fe42a8e395b", service.name="ZCPerf", telemetry.sdk.language="java", telemetry.sdk.name="opentelemetry", telemetry.sdk.version="1.39.0"}}, instrumentationScopeInfo=InstrumentationScopeInfo{name=io.opentelemetry.sdk.trace, version=null, schemaUrl=null, attributes={}}, name=queueSize, description=The number of items queued, unit=1, type=LONG_GAUGE, data=ImmutableGaugeData{points=[ImmutableLongPointData{startEpochNanos=1751285851603000000, epochNanos=1751286691612000000, attributes={processorType="BatchSpanProcessor"}, value=632, exemplars=[]}]}}
// TPS has increased. Spans are dropped (dropped=true and count increasing)
[6/30/25, 12:32:31:617 GMT] 0000003d io.opentelemetry.exporter.logging.LoggingMetricExporter      I metric: ImmutableMetricData{resource=Resource{schemaUrl=null, attributes={host.arch="s390x", host.name="WINMVS22", os.description="z/OS 02.05.00", os.type="z_os", process.runtime.description="IBM Corporation IBM J9 VM ibm-jdk21-zOS-21.0.4-f45de8e9eb0", process.runtime.name="IBM Semeru Runtime Certified Edition for z/OS", process.runtime.version="21.0.4+7", service.instance.id="b102f051-ad26-49eb-8c57-7fe42a8e395b", service.name="ZCPerf", telemetry.sdk.language="java", telemetry.sdk.name="opentelemetry", telemetry.sdk.version="1.39.0"}}, instrumentationScopeInfo=InstrumentationScopeInfo{name=io.opentelemetry.sdk.trace, version=null, schemaUrl=null, attributes={}}, name=processedSpans, description=The number of spans processed by the BatchSpanProcessor. [dropped=true if they were dropped due to high throughput], unit=1, type=LONG_SUM, data=ImmutableSumData{points=[ImmutableLongPointData{startEpochNanos=1751285851603000000, epochNanos=1751286751612000000, attributes={dropped=true, processorType="BatchSpanProcessor"}, value=29703, exemplars=[ImmutableLongExemplarData{filteredAttributes={}, epochNanos=1751286740188000000, spanContext=ImmutableSpanContext{traceId=af4de24242b7c84751903c8eb8f6ddfe, spanId=a454a9dd8ae8b1b5, traceFlags=01, traceState=ArrayBasedTraceState{entries=[]}, remote=false, valid=true}, value=1}, ImmutableLongExemplarData{filteredAttributes={}, epochNanos=1751286744356000000, spanContext=ImmutableSpanContext{traceId=4fdfb533097a6fa4c0165e38129fbffe, spanId=6108ed087b5be999, traceFlags=01, traceState=ArrayBasedTraceState{entries=[]}, remote=false, valid=true}, value=1}, ImmutableLongExemplarData{filteredAttributes={}, epochNanos=1751286747334000000, spanContext=ImmutableSpanContext{traceId=dbd05d5274902ee3da214458cfe61e91, spanId=e0a0d406be960ed1, traceFlags=01, traceState=ArrayBasedTraceState{entries=[]}, remote=false, valid=true}, value=1}, ImmutableLongExemplarData{filteredAttributes={}, epochNanos=1751286744956000000, spanContext=ImmutableSpanContext{traceId=06046f798e5c63e7957fc310dedbdf77, spanId=43394f3fbe67efae, traceFlags=01, traceState=ArrayBasedTraceState{entries=[]}, remote=false, valid=true}, value=1}, ImmutableLongExemplarData{filteredAttributes={}, epochNanos=1751286744974000000, spanContext=ImmutableSpanContext{traceId=ff3cbdbbc941987b38ba39300233a807, spanId=5f6473e4711324be, traceFlags=01, traceState=ArrayBasedTraceState{entries=[]}, remote=false, valid=true}, value=1}, ImmutableLongExemplarData{filteredAttributes={}, epochNanos=1751286750006000000, spanContext=ImmutableSpanContext{traceId=eec0c7b166d6de811f61dcbbc77c74a3, spanId=7429e749111343e4, traceFlags=01, traceState=ArrayBasedTraceState{entries=[]}, remote=false, valid=true}, value=1}]}, ImmutableLongPointData{startEpochNanos=1751285851603000000, epochNanos=1751286751612000000, attributes={dropped=false, processorType="BatchSpanProcessor"}, value=1512304, exemplars=[]}], monotonic=true, aggregationTemporality=CUMULATIVE}}
[6/30/25, 12:32:31:617 GMT] 0000003d io.opentelemetry.exporter.logging.LoggingMetricExporter      I metric: ImmutableMetricData{resource=Resource{schemaUrl=null, attributes={host.arch="s390x", host.name="WINMVS22", os.description="z/OS 02.05.00", os.type="z_os", process.runtime.description="IBM Corporation IBM J9 VM ibm-jdk21-zOS-21.0.4-f45de8e9eb0", process.runtime.name="IBM Semeru Runtime Certified Edition for z/OS", process.runtime.version="21.0.4+7", service.instance.id="b102f051-ad26-49eb-8c57-7fe42a8e395b", service.name="ZCPerf", telemetry.sdk.language="java", telemetry.sdk.name="opentelemetry", telemetry.sdk.version="1.39.0"}}, instrumentationScopeInfo=InstrumentationScopeInfo{name=io.opentelemetry.sdk.trace, version=null, schemaUrl=null, attributes={}}, name=queueSize, description=The number of items queued, unit=1, type=LONG_GAUGE, data=ImmutableGaugeData{points=[ImmutableLongPointData{startEpochNanos=1751285851603000000, epochNanos=1751286751612000000, attributes={processorType="BatchSpanProcessor"}, value=14425, exemplars=[]}]}}
[6/30/25, 12:33:31:616 GMT] 0000003d io.opentelemetry.exporter.logging.LoggingMetricExporter      I metric: ImmutableMetricData{resource=Resource{schemaUrl=null, attributes={host.arch="s390x", host.name="WINMVS22", os.description="z/OS 02.05.00", os.type="z_os", process.runtime.description="IBM Corporation IBM J9 VM ibm-jdk21-zOS-21.0.4-f45de8e9eb0", process.runtime.name="IBM Semeru Runtime Certified Edition for z/OS", process.runtime.version="21.0.4+7", service.instance.id="b102f051-ad26-49eb-8c57-7fe42a8e395b", service.name="ZCPerf", telemetry.sdk.language="java", telemetry.sdk.name="opentelemetry", telemetry.sdk.version="1.39.0"}}, instrumentationScopeInfo=InstrumentationScopeInfo{name=io.opentelemetry.sdk.trace, version=null, schemaUrl=null, attributes={}}, name=processedSpans, description=The number of spans processed by the BatchSpanProcessor. [dropped=true if they were dropped due to high throughput], unit=1, type=LONG_SUM, data=ImmutableSumData{points=[ImmutableLongPointData{startEpochNanos=1751285851603000000, epochNanos=1751286811612000000, attributes={dropped=true, processorType="BatchSpanProcessor"}, value=128432, exemplars=[ImmutableLongExemplarData{filteredAttributes={}, epochNanos=1751286752383000000, spanContext=ImmutableSpanContext{traceId=74dcf4ac46b800547e77795323be83e4, spanId=6588dc2fed00a0b2, traceFlags=01, traceState=ArrayBasedTraceState{entries=[]}, remote=false, valid=true}, value=1}, ImmutableLongExemplarData{filteredAttributes={}, epochNanos=1751286759903000000, spanContext=ImmutableSpanContext{traceId=193bc71a6952e26ec08119cb58cd0b49, spanId=2042273c6a36571f, traceFlags=01, traceState=ArrayBasedTraceState{entries=[]}, remote=false, valid=true}, value=1}, ImmutableLongExemplarData{filteredAttributes={}, epochNanos=1751286756998000000, spanContext=ImmutableSpanContext{traceId=3a229b76874f583173d183c7bfe5e3c0, spanId=bfe8ddb4fe5e9927, traceFlags=01, traceState=ArrayBasedTraceState{entries=[]}, remote=false, valid=true}, value=1}, ImmutableLongExemplarData{filteredAttributes={}, epochNanos=1751286773424000000, spanContext=ImmutableSpanContext{traceId=0125c4c5fb6087b46b0aef3e0ddd83f3, spanId=79bd0f13f80d67c2, traceFlags=01, traceState=ArrayBasedTraceState{entries=[]}, remote=false, valid=true}, value=1}, ImmutableLongExemplarData{filteredAttributes={}, epochNanos=1751286760263000000, spanContext=ImmutableSpanContext{traceId=38989c2f71d596d430e21a06d11e1ca4, spanId=64c860cbddb4ea5a, traceFlags=01, traceState=ArrayBasedTraceState{entries=[]}, remote=false, valid=true}, value=1}, ImmutableLongExemplarData{filteredAttributes={}, epochNanos=1751286811551000000, spanContext=ImmutableSpanContext{traceId=451c7614da64096faf80a6cd7296219a, spanId=a02d04cfe3058755, traceFlags=01, traceState=ArrayBasedTraceState{entries=[]}, remote=false, valid=true}, value=1}]}, ImmutableLongPointData{startEpochNanos=1751285851603000000, epochNanos=1751286811612000000, attributes={dropped=false, processorType="BatchSpanProcessor"}, value=1940336, exemplars=[]}], monotonic=true, aggregationTemporality=CUMULATIVE}}
[6/30/25, 12:33:31:616 GMT] 0000003d io.opentelemetry.exporter.logging.LoggingMetricExporter      I metric: ImmutableMetricData{resource=Resource{schemaUrl=null, attributes={host.arch="s390x", host.name="WINMVS22", os.description="z/OS 02.05.00", os.type="z_os", process.runtime.description="IBM Corporation IBM J9 VM ibm-jdk21-zOS-21.0.4-f45de8e9eb0", process.runtime.name="IBM Semeru Runtime Certified Edition for z/OS", process.runtime.version="21.0.4+7", service.instance.id="b102f051-ad26-49eb-8c57-7fe42a8e395b", service.name="ZCPerf", telemetry.sdk.language="java", telemetry.sdk.name="opentelemetry", telemetry.sdk.version="1.39.0"}}, instrumentationScopeInfo=InstrumentationScopeInfo{name=io.opentelemetry.sdk.trace, version=null, schemaUrl=null, attributes={}}, name=queueSize, description=The number of items queued, unit=1, type=LONG_GAUGE, data=ImmutableGaugeData{points=[ImmutableLongPointData{startEpochNanos=1751285851603000000, epochNanos=1751286811612000000, attributes={processorType="BatchSpanProcessor"}, value=14357, exemplars=[]}]}}
[6/30/25, 12:34:31:615 GMT] 0000003d io.opentelemetry.exporter.logging.LoggingMetricExporter      I metric: ImmutableMetricData{resource=Resource{schemaUrl=null, attributes={host.arch="s390x", host.name="WINMVS22", os.description="z/OS 02.05.00", os.type="z_os", process.runtime.description="IBM Corporation IBM J9 VM ibm-jdk21-zOS-21.0.4-f45de8e9eb0", process.runtime.name="IBM Semeru Runtime Certified Edition for z/OS", process.runtime.version="21.0.4+7", service.instance.id="b102f051-ad26-49eb-8c57-7fe42a8e395b", service.name="ZCPerf", telemetry.sdk.language="java", telemetry.sdk.name="opentelemetry", telemetry.sdk.version="1.39.0"}}, instrumentationScopeInfo=InstrumentationScopeInfo{name=io.opentelemetry.sdk.trace, version=null, schemaUrl=null, attributes={}}, name=processedSpans, description=The number of spans processed by the BatchSpanProcessor. [dropped=true if they were dropped due to high throughput], unit=1, type=LONG_SUM, data=ImmutableSumData{points=[ImmutableLongPointData{startEpochNanos=1751285851603000000, epochNanos=1751286871612000000, attributes={dropped=true, processorType="BatchSpanProcessor"}, value=196064, exemplars=[ImmutableLongExemplarData{filteredAttributes={}, epochNanos=1751286813607000000, spanContext=ImmutableSpanContext{traceId=a41a3d9bd4e09a7e9564592f6c7a17cf, spanId=255703a0b4826de3, traceFlags=01, traceState=ArrayBasedTraceState{entries=[]}, remote=false, valid=true}, value=1}, ImmutableLongExemplarData{filteredAttributes={}, epochNanos=1751286812706000000, spanContext=ImmutableSpanContext{traceId=219f877611e618f8f0397aa4d9524a33, spanId=106e8763fee56a82, traceFlags=01, traceState=ArrayBasedTraceState{entries=[]}, remote=false, valid=true}, value=1}, ImmutableLongExemplarData{filteredAttributes={}, epochNanos=1751286841245000000, spanContext=ImmutableSpanContext{traceId=f59bfb09953f725bbe286aad71c34a89, spanId=758ab2dda8bae1b9, traceFlags=01, traceState=ArrayBasedTraceState{entries=[]}, remote=false, valid=true}, value=1}, ImmutableLongExemplarData{filteredAttributes={}, epochNanos=1751286837669000000, spanContext=ImmutableSpanContext{traceId=a0c234a9c5bc2a2fd0da13886aeb62d4, spanId=73edfda08eced5c2, traceFlags=01, traceState=ArrayBasedTraceState{entries=[]}, remote=false, valid=true}, value=1}, ImmutableLongExemplarData{filteredAttributes={}, epochNanos=1751286820306000000, spanContext=ImmutableSpanContext{traceId=851242a63af3f5f1efbb46edc298aeb8, spanId=7dfe95deb6033ace, traceFlags=01, traceState=ArrayBasedTraceState{entries=[]}, remote=false, valid=true}, value=1}, ImmutableLongExemplarData{filteredAttributes={}, epochNanos=1751286841302000000, spanContext=ImmutableSpanContext{traceId=a17c084a4030670886353678277b7b8e, spanId=112e666a2505b5a4, traceFlags=01, traceState=ArrayBasedTraceState{entries=[]}, remote=false, valid=true}, value=1}]}, ImmutableLongPointData{startEpochNanos=1751285851603000000, epochNanos=1751286871612000000, attributes={dropped=false, processorType="BatchSpanProcessor"}, value=2292592, exemplars=[]}], monotonic=true, aggregationTemporality=CUMULATIVE}}
[6/30/25, 12:34:31:615 GMT] 0000003d io.opentelemetry.exporter.logging.LoggingMetricExporter      I metric: ImmutableMetricData{resource=Resource{schemaUrl=null, attributes={host.arch="s390x", host.name="WINMVS22", os.description="z/OS 02.05.00", os.type="z_os", process.runtime.description="IBM Corporation IBM J9 VM ibm-jdk21-zOS-21.0.4-f45de8e9eb0", process.runtime.name="IBM Semeru Runtime Certified Edition for z/OS", process.runtime.version="21.0.4+7", service.instance.id="b102f051-ad26-49eb-8c57-7fe42a8e395b", service.name="ZCPerf", telemetry.sdk.language="java", telemetry.sdk.name="opentelemetry", telemetry.sdk.version="1.39.0"}}, instrumentationScopeInfo=InstrumentationScopeInfo{name=io.opentelemetry.sdk.trace, version=null, schemaUrl=null, attributes={}}, name=queueSize, description=The number of items queued, unit=1, type=LONG_GAUGE, data=ImmutableGaugeData{points=[ImmutableLongPointData{startEpochNanos=1751285851603000000, epochNanos=1751286871612000000, attributes={processorType="BatchSpanProcessor"}, value=988, exemplars=[]}]}}
// TPS has decreased. Spans are no longer being dropped (dropped=true and count is not increasing)
[6/30/25, 12:35:31:615 GMT] 0000003d io.opentelemetry.exporter.logging.LoggingMetricExporter      I metric: ImmutableMetricData{resource=Resource{schemaUrl=null, attributes={host.arch="s390x", host.name="WINMVS22", os.description="z/OS 02.05.00", os.type="z_os", process.runtime.description="IBM Corporation IBM J9 VM ibm-jdk21-zOS-21.0.4-f45de8e9eb0", process.runtime.name="IBM Semeru Runtime Certified Edition for z/OS", process.runtime.version="21.0.4+7", service.instance.id="b102f051-ad26-49eb-8c57-7fe42a8e395b", service.name="ZCPerf", telemetry.sdk.language="java", telemetry.sdk.name="opentelemetry", telemetry.sdk.version="1.39.0"}}, instrumentationScopeInfo=InstrumentationScopeInfo{name=io.opentelemetry.sdk.trace, version=null, schemaUrl=null, attributes={}}, name=processedSpans, description=The number of spans processed by the BatchSpanProcessor. [dropped=true if they were dropped due to high throughput], unit=1, type=LONG_SUM, data=ImmutableSumData{points=[ImmutableLongPointData{startEpochNanos=1751285851603000000, epochNanos=1751286931612000000, attributes={dropped=true, processorType="BatchSpanProcessor"}, value=196064, exemplars=[]}, ImmutableLongPointData{startEpochNanos=1751285851603000000, epochNanos=1751286931612000000, attributes={dropped=false, processorType="BatchSpanProcessor"}, value=2435952, exemplars=[]}], monotonic=true, aggregationTemporality=CUMULATIVE}}
[6/30/25, 12:35:31:615 GMT] 0000003d io.opentelemetry.exporter.logging.LoggingMetricExporter      I metric: ImmutableMetricData{resource=Resource{schemaUrl=null, attributes={host.arch="s390x", host.name="WINMVS22", os.description="z/OS 02.05.00", os.type="z_os", process.runtime.description="IBM Corporation IBM J9 VM ibm-jdk21-zOS-21.0.4-f45de8e9eb0", process.runtime.name="IBM Semeru Runtime Certified Edition for z/OS", process.runtime.version="21.0.4+7", service.instance.id="b102f051-ad26-49eb-8c57-7fe42a8e395b", service.name="ZCPerf", telemetry.sdk.language="java", telemetry.sdk.name="opentelemetry", telemetry.sdk.version="1.39.0"}}, instrumentationScopeInfo=InstrumentationScopeInfo{name=io.opentelemetry.sdk.trace, version=null, schemaUrl=null, attributes={}}, name=queueSize, description=The number of items queued, unit=1, type=LONG_GAUGE, data=ImmutableGaugeData{points=[ImmutableLongPointData{startEpochNanos=1751285851603000000, epochNanos=1751286931612000000, attributes={processorType="BatchSpanProcessor"}, value=780, exemplars=[]}]}}
[6/30/25, 12:36:31:615 GMT] 0000003d io.opentelemetry.exporter.logging.LoggingMetricExporter      I metric: ImmutableMetricData{resource=Resource{schemaUrl=null, attributes={host.arch="s390x", host.name="WINMVS22", os.description="z/OS 02.05.00", os.type="z_os", process.runtime.description="IBM Corporation IBM J9 VM ibm-jdk21-zOS-21.0.4-f45de8e9eb0", process.runtime.name="IBM Semeru Runtime Certified Edition for z/OS", process.runtime.version="21.0.4+7", service.instance.id="b102f051-ad26-49eb-8c57-7fe42a8e395b", service.name="ZCPerf", telemetry.sdk.language="java", telemetry.sdk.name="opentelemetry", telemetry.sdk.version="1.39.0"}}, instrumentationScopeInfo=InstrumentationScopeInfo{name=io.opentelemetry.sdk.trace, version=null, schemaUrl=null, attributes={}}, name=processedSpans, description=The number of spans processed by the BatchSpanProcessor. [dropped=true if they were dropped due to high throughput], unit=1, type=LONG_SUM, data=ImmutableSumData{points=[ImmutableLongPointData{startEpochNanos=1751285851603000000, epochNanos=1751286991612000000, attributes={dropped=true, processorType="BatchSpanProcessor"}, value=196064, exemplars=[]}, ImmutableLongPointData{startEpochNanos=1751285851603000000, epochNanos=1751286991612000000, attributes={dropped=false, processorType="BatchSpanProcessor"}, value=2579312, exemplars=[]}], monotonic=true, aggregationTemporality=CUMULATIVE}}
[6/30/25, 12:36:31:615 GMT] 0000003d io.opentelemetry.exporter.logging.LoggingMetricExporter      I metric: ImmutableMetricData{resource=Resource{schemaUrl=null, attributes={host.arch="s390x", host.name="WINMVS22", os.description="z/OS 02.05.00", os.type="z_os", process.runtime.description="IBM Corporation IBM J9 VM ibm-jdk21-zOS-21.0.4-f45de8e9eb0", process.runtime.name="IBM Semeru Runtime Certified Edition for z/OS", process.runtime.version="21.0.4+7", service.instance.id="b102f051-ad26-49eb-8c57-7fe42a8e395b", service.name="ZCPerf", telemetry.sdk.language="java", telemetry.sdk.name="opentelemetry", telemetry.sdk.version="1.39.0"}}, instrumentationScopeInfo=InstrumentationScopeInfo{name=io.opentelemetry.sdk.trace, version=null, schemaUrl=null, attributes={}}, name=queueSize, description=The number of items queued, unit=1, type=LONG_GAUGE, data=ImmutableGaugeData{points=[ImmutableLongPointData{startEpochNanos=1751285851603000000, epochNanos=1751286991612000000, attributes={processorType="BatchSpanProcessor"}, value=92, exemplars=[]}]}}

A summary of the OpenTelemetry metrics that are written to messages.log are in the following table. In this example,
  • A consistent workload (TPS) is used for the first 4 minutes (12:27-12:30).
  • During the minute 12:31, the TPS is increased which is visible by a large increase in the exported spans count compared to the first 4 minutes.
  • At minute 12:32, the TPS is increased further and spans cannot be exported fast enough. As a result, the max queue size of 10,000 is exceeded and the dropped spans count is increasing.
  • At minute 12:33, the dropped spans count is still increasing.
  • During the minute 12:34, the TPS is decreased and as a result the queue size is less than the max queue size. The dropped spans count has increased as part of this minute interval was at higher TPS. The smaller queue size suggests this TPS is OK for export of spans.
  • At minutes 12:35 and 12:36, the dropped spans count does not change. The TPS is at a rate that is suitable for export of all spans.
Table 1. Summary of dropped spans data from the messages.log file
Time Exported spans count Queue size Dropped spans count
12:27:31:618 447344 556 0
12:28:31:615 494448 1284 0
12:29:31:615 543600 244 0
12:30:31:615 727920 1181 0
12:31:31:615 1110896 632 0
12:32:31:617 1512304 14425 29703 (Note 1.)
12:33:31:616 1940336 14357 128432
12:34:31:615 2292592 988 196064
12:35:31:615 2435952 780 196064 (Note 2.)
12:36:31:615 2579312 92 196064
Note:
  1. Once spans start to be dropped, the droppedSpans count is inserted before the exported spans number.
  2. The TPS rate is no longer higher than the span export rate and spans are no longer being dropped. The droppedSpans count is not increasing.

To determine the TPS at which your server starts to drop spans, you need to run a consistent workload at a known TPS.

  1. Within each metrics' output, there are two entries that use InstrumentationScopeInfo{name=io.opentelemetry.sdk.trace. One of these lines has name=queueSize and the other name=processedSpans.
  2. The line with name=queueSize reports the number of items queued in the value=<number> field.
  3. The line with name=processedSpans reports if spans are dropped, the number of dropped spans, and the number of exported spans.
  4. Let the server run for several minutes at a consistent TPS. Then refresh the messages.log and check that dropped=false is still present on each name=processedSpans line. This confirms that all spans are being exported and none are being dropped for this TPS.
  5. Now, increase the TPS and repeat step 4 until you start to see dropped=true. This confirms that the current TPS is too high for the number of spans.
  6. Upon finding a TPS that drops spans, you must assess the needs of your environment; whether the TPS is suitable for your maximum expected workload in production. If the TPS is not yet high enough, you need to increase one or both of otel.bsp.max.export.batch.size and otel.bsp.max.queue.size. It is recommended to change only one variable at a time and repeat the test. Increasing the queue size will only delay the time before spans are dropped, but a higher value can help with the peaks in workload that are short term.

It is recommended that you ensure your server can handle a TPS without dropping spans that is at least 25% greater than your current expected maximum production TPS. This allows for future growth of your environment.