Configuring OpenTelemetry for monitoring with z/OS Connect
How to configure OpenTelemetry for monitoring with IBM® z/OS® Connect.
zosConnect-3.0 Applies to zosConnect-3.0.
z/OS Connect uses the Liberty MicroProfile Telemetry 2.0 feature to provide OpenTelemetry monitoring. OpenTelemetry can be configured at either the server or at the API level. When the feature is enabled, OpenTelemetry is implicitly disabled, and needs to be enabled by configuration. You can collect OpenTelemetry logs, metrics, and traces but only traces are enhanced with z/OS Connect specific data, metrics and logs contain only the base Liberty data.
<featureManager>
<feature>mpTelemetry-2.0</feature>
</featureManager>How to enable OpenTelemetry for all APIs
OpenTelemetry can be enabled by setting system properties or environment variables. The
corresponding environment variable name for any given system property can be obtained by changing
the name to uppercase, and replacing all '.' characters with '_'
characters. For example, the otel.sdk.disabled system property is equivalent to the
OTEL_SDK_DISABLED environment variable.
- Set system properties in bootstrap.properties. These are per server settings.
- Set system properties in a JVM options <name>.options file. For more information, see Specifying JVM options. These options can be shared across servers.
- Set environment variables in server.env or the server JCL
STDENV DDstatement. These are per server settings.
The values cannot be changed in a running server. If running the server as a started task, a server restart is required to change the values. If deployed as a container image, the image must be rebuilt and redeployed for configuration changes. If a property is defined as both a system property and environment variable, the system property takes priority.
You may be required, or you may want to set the following system properties. For more information
on these and other properties, see
Configure the SDK.
- otel.sdk.disabled
-
Enable OpenTelemetry within the z/OS Connect Server by setting
otel.sdk.disabled=false. The default value isotel.sdk.disabled=true, which disables OpenTelemetry.
- otel.traces.sampler
- Set the
otel.traces.samplerto determine which spans are recorded andsampled. The default value is
otel.traces.sampler=parentbased_always_onthat initiates a new trace context for all requests that don’t have a context. For example, setotel.traces.sampler=parentbased_always_offproperty can be configured to stop this context initializing such that only requests that include a trace context are traced.
- otel.service.name
-
Set the otel.service.name property to the server name. This name appears in spans and identifies the span creator. A default of unknown_service is used when this is not configured. When configuring OpenTelemetry with z/OS Connect for all APIs, make sure that each server has a unique otel.service.name set, to aid problem determination.
For example, set otel.service.name to the server name in the bootstrap.properties file for each z/OS Connect Server and add the common properties in the shared <name>.options file.
- otel.traces.exporter
-
The type of span exporter. This depends on which OpenTelemetry Collector you choose to use. The default is
otel.traces.exporter=otlp. If you have not yet set up an OpenTelemetry Collector, you can temporarily export spans to messages.log by configuringotel.traces.exporter=console.
- otel.exporter.otlp.endpoint
-
The endpoint to send all OTLP traces to which is often the address of an OpenTelemetry Collector. This configuration property must be a URL with a scheme of either
httporhttpsbased on the use of TLS. For example,otel.exporter.otlp.endpoint=http://localhost:4317Use the following properties as required when configuring a TLS connection. The Liberty ssl configuration is not applicable.
otel.exporter.otlp.{signal}.endpointotel.exporter.otlp.certificateotel.exporter.otlp.{signal}.certificateotel.exporter.otlp.client.keyotel.exporter.otlp.{signal}.client.keyotel.exporter.otlp.client.certificateotel.exporter.otlp.{signal}.client.certificate
- otel.logs.exporter
-
Enable the OpenTelemetry Logs Exporter type. For z/OS Connect, set
otel.logs.exporter=none
- otel.metrics.exporter
-
Enable the Metrics Exporter type. For z/OS Connect, set
otel.metrics.exporter=none.
otel.sdk.disabled=false
otel.service.name=<server_name>
otel.traces.exporter=otlp
otel.exporter.otlp.endpoint=http://localhost:4317
otel.logs.exporter=none
otel.metrics.exporter=noneHow to enable OpenTelemetry for specific APIs
If OpenTelemetry is only required for specific APIs in a server, it can be configured by using the following steps.
- Configure the properties that are common for all APIs in a server, at the server level. It is
important not to set the otel.sdk.disabled property at the server level, in this
instance. For example, set the following properties in bootstrap.properties or
a JVM options <name>.options file:
otel.traces.exporter=otlp otel.exporter.otlp.endpoint=http://localhost:4317 otel.logs.exporter=none otel.metrics.exporter=none - For each API that you want to enable OpenTelemetry for, in the
webApplication element, set the otel.sdk.disabled property
to false, for example:
<webApplication id="ExampleApi" contextRoot="/example" location="${server.config.dir}/apps/example-api.war" name="example-api"> <appProperties> <property name="otel.sdk.disabled" value="false" /> </appProperties> </webApplication>Important: All OpenTelemetry configuration in the webApplication element is ignored if otel.sdk.disabled is configured at the server-level.By default, the value of otel.service.name is the application name. This can be changed by setting the otel.service.name property to name any telemetry produced by this API. If you have the same APIs deployed in multiple servers, you might want to include a server identifier in the otel.service.name name, in addition to an API identifier.
Started task Changing this configuration needs a refresh of the Liberty configuration by using the Liberty REFRESH,CONFIG command. This stops the API and then starts it with the updated configuration.
Containers Changing this configuration needs a rebuild and redeploy of the z/OS Connect Server image.
Using Sampling to limit span export
otel.traces.sampler can be configured to apply a ratio
to the incoming requests such that a specific rate of requests have their spans exported. This can
be used to reduce the total number of spans being exported from a server.- Example 1
- Configuring the following will result in 50% of requests being exported.
- Example 2
- Configuring the following will result in 10% of requests being exported.
- Example 3
- By using
parentbased_traceidratioyou can take into account whether the request into the z/OS Connect Server contained a trace context by using the traceparent header. For more information about the W3C Trace Context sampled flag, seehttps://www.w3.org/TR/trace-context/#sampled-flag).
Additional recommended OpenTelemetry configuration options
-
It is recommended to disable the ProcessResourceProvider as this adds span attributes to every span, which include long strings, such as the z/OS Connect Server JVM command line arguments. To reduce the memory requirements for OpenTelemetry and improve the export performance, add the following to the configuration file.
otel.java.disabled.resource.providers=io.opentelemetry.instrumentation.resources.ProcessResourceProvider -
The default configuration for span export is too small for some workloads, which leads to dropped spans that are then lost. The default maximum number of spans that can be queued before batching is 2048 and the default maximum number of spans to export in a single batch is 512. The following configuration can be used as a reasonable starting point for performance testing. For more information, see How to ensure that every span produced is exported for guidance.
otel.bsp.max.queue.size=10000 otel.bsp.max.export.batch.size=2048
How to ensure that every span produced is exported
The z/OS Connect Server produces four spans for
every request that is received into the server for API provider and API requester. The number of
spans that are produced correlates directly with the transaction rate, often measured in
transactions per second (TPS). For example, with a TPS of 1000, there will be 4000 spans produced
each second (assuming that all API provider requests are completing without error). These spans are
exported in batches to improve performance. The size of the batches depends on the
otel.bsp.max.export.batch.size configuration. Spans are written to a queue for
export. The otel.bsp.max.queue.size property sets the queue size. In a busy server
where the TPS rate is higher than the span export rate, the queue will gradually grow. Once the
queue size reaches the otel.bsp.max.queue.size limit, spans start to be dropped,
resulting in OpenTelemetry traces with missing spans.
This is a deliberate design decision by the OpenTelemetry community and as such no errors are written to logs in this case. It is possible to detect whether spans are being dropped through performance testing.
mpTelemetry-2.0 feature provides OpenTelemetry metrics that can be observed
within messages.log by configuring
otel.metrics.exporter=console. The collected metrics are written to
messages.log at one minute intervals by class
io.opentelemetry.exporter.logging.LoggingMetricExporter. Two of these metrics
indicate the span queue size, the number of exported spans, and the number of dropped spans (only
when dropped=true). An example of the OpenTelemetry metrics that are written to
messages.log is as follows. This example has many lines that are redacted for
simplicity and some annotations explaining the content.
// 5 minutes with no spans droppped. Configuration is OK for this TPS.
[6/30/25, 12:27:31:618 GMT] 0000003d io.opentelemetry.exporter.logging.LoggingMetricExporter I metric: ImmutableMetricData{resource=Resource{schemaUrl=null, attributes={host.arch="s390x", host.name="WINMVS22", os.description="z/OS 02.05.00", os.type="z_os", process.runtime.description="IBM Corporation IBM J9 VM ibm-jdk21-zOS-21.0.4-f45de8e9eb0", process.runtime.name="IBM Semeru Runtime Certified Edition for z/OS", process.runtime.version="21.0.4+7", service.instance.id="b102f051-ad26-49eb-8c57-7fe42a8e395b", service.name="ZCPerf", telemetry.sdk.language="java", telemetry.sdk.name="opentelemetry", telemetry.sdk.version="1.39.0"}}, instrumentationScopeInfo=InstrumentationScopeInfo{name=io.opentelemetry.sdk.trace, version=null, schemaUrl=null, attributes={}}, name=processedSpans, description=The number of spans processed by the BatchSpanProcessor. [dropped=true if they were dropped due to high throughput], unit=1, type=LONG_SUM, data=ImmutableSumData{points=[ImmutableLongPointData{startEpochNanos=1751285851603000000, epochNanos=1751286451612000000, attributes={dropped=false, processorType="BatchSpanProcessor"}, value=447344, exemplars=[]}], monotonic=true, aggregationTemporality=CUMULATIVE}}
[6/30/25, 12:27:31:618 GMT] 0000003d io.opentelemetry.exporter.logging.LoggingMetricExporter I metric: ImmutableMetricData{resource=Resource{schemaUrl=null, attributes={host.arch="s390x", host.name="WINMVS22", os.description="z/OS 02.05.00", os.type="z_os", process.runtime.description="IBM Corporation IBM J9 VM ibm-jdk21-zOS-21.0.4-f45de8e9eb0", process.runtime.name="IBM Semeru Runtime Certified Edition for z/OS", process.runtime.version="21.0.4+7", service.instance.id="b102f051-ad26-49eb-8c57-7fe42a8e395b", service.name="ZCPerf", telemetry.sdk.language="java", telemetry.sdk.name="opentelemetry", telemetry.sdk.version="1.39.0"}}, instrumentationScopeInfo=InstrumentationScopeInfo{name=io.opentelemetry.sdk.trace, version=null, schemaUrl=null, attributes={}}, name=queueSize, description=The number of items queued, unit=1, type=LONG_GAUGE, data=ImmutableGaugeData{points=[ImmutableLongPointData{startEpochNanos=1751285851603000000, epochNanos=1751286451612000000, attributes={processorType="BatchSpanProcessor"}, value=556, exemplars=[]}]}}
[6/30/25, 12:28:31:615 GMT] 0000003d io.opentelemetry.exporter.logging.LoggingMetricExporter I metric: ImmutableMetricData{resource=Resource{schemaUrl=null, attributes={host.arch="s390x", host.name="WINMVS22", os.description="z/OS 02.05.00", os.type="z_os", process.runtime.description="IBM Corporation IBM J9 VM ibm-jdk21-zOS-21.0.4-f45de8e9eb0", process.runtime.name="IBM Semeru Runtime Certified Edition for z/OS", process.runtime.version="21.0.4+7", service.instance.id="b102f051-ad26-49eb-8c57-7fe42a8e395b", service.name="ZCPerf", telemetry.sdk.language="java", telemetry.sdk.name="opentelemetry", telemetry.sdk.version="1.39.0"}}, instrumentationScopeInfo=InstrumentationScopeInfo{name=io.opentelemetry.sdk.trace, version=null, schemaUrl=null, attributes={}}, name=processedSpans, description=The number of spans processed by the BatchSpanProcessor. [dropped=true if they were dropped due to high throughput], unit=1, type=LONG_SUM, data=ImmutableSumData{points=[ImmutableLongPointData{startEpochNanos=1751285851603000000, epochNanos=1751286511612000000, attributes={dropped=false, processorType="BatchSpanProcessor"}, value=494448, exemplars=[]}], monotonic=true, aggregationTemporality=CUMULATIVE}}
[6/30/25, 12:28:31:615 GMT] 0000003d io.opentelemetry.exporter.logging.LoggingMetricExporter I metric: ImmutableMetricData{resource=Resource{schemaUrl=null, attributes={host.arch="s390x", host.name="WINMVS22", os.description="z/OS 02.05.00", os.type="z_os", process.runtime.description="IBM Corporation IBM J9 VM ibm-jdk21-zOS-21.0.4-f45de8e9eb0", process.runtime.name="IBM Semeru Runtime Certified Edition for z/OS", process.runtime.version="21.0.4+7", service.instance.id="b102f051-ad26-49eb-8c57-7fe42a8e395b", service.name="ZCPerf", telemetry.sdk.language="java", telemetry.sdk.name="opentelemetry", telemetry.sdk.version="1.39.0"}}, instrumentationScopeInfo=InstrumentationScopeInfo{name=io.opentelemetry.sdk.trace, version=null, schemaUrl=null, attributes={}}, name=queueSize, description=The number of items queued, unit=1, type=LONG_GAUGE, data=ImmutableGaugeData{points=[ImmutableLongPointData{startEpochNanos=1751285851603000000, epochNanos=1751286511612000000, attributes={processorType="BatchSpanProcessor"}, value=1284, exemplars=[]}]}}
[6/30/25, 12:29:31:615 GMT] 0000003d io.opentelemetry.exporter.logging.LoggingMetricExporter I metric: ImmutableMetricData{resource=Resource{schemaUrl=null, attributes={host.arch="s390x", host.name="WINMVS22", os.description="z/OS 02.05.00", os.type="z_os", process.runtime.description="IBM Corporation IBM J9 VM ibm-jdk21-zOS-21.0.4-f45de8e9eb0", process.runtime.name="IBM Semeru Runtime Certified Edition for z/OS", process.runtime.version="21.0.4+7", service.instance.id="b102f051-ad26-49eb-8c57-7fe42a8e395b", service.name="ZCPerf", telemetry.sdk.language="java", telemetry.sdk.name="opentelemetry", telemetry.sdk.version="1.39.0"}}, instrumentationScopeInfo=InstrumentationScopeInfo{name=io.opentelemetry.sdk.trace, version=null, schemaUrl=null, attributes={}}, name=processedSpans, description=The number of spans processed by the BatchSpanProcessor. [dropped=true if they were dropped due to high throughput], unit=1, type=LONG_SUM, data=ImmutableSumData{points=[ImmutableLongPointData{startEpochNanos=1751285851603000000, epochNanos=1751286571612000000, attributes={dropped=false, processorType="BatchSpanProcessor"}, value=543600, exemplars=[]}], monotonic=true, aggregationTemporality=CUMULATIVE}}
[6/30/25, 12:29:31:615 GMT] 0000003d io.opentelemetry.exporter.logging.LoggingMetricExporter I metric: ImmutableMetricData{resource=Resource{schemaUrl=null, attributes={host.arch="s390x", host.name="WINMVS22", os.description="z/OS 02.05.00", os.type="z_os", process.runtime.description="IBM Corporation IBM J9 VM ibm-jdk21-zOS-21.0.4-f45de8e9eb0", process.runtime.name="IBM Semeru Runtime Certified Edition for z/OS", process.runtime.version="21.0.4+7", service.instance.id="b102f051-ad26-49eb-8c57-7fe42a8e395b", service.name="ZCPerf", telemetry.sdk.language="java", telemetry.sdk.name="opentelemetry", telemetry.sdk.version="1.39.0"}}, instrumentationScopeInfo=InstrumentationScopeInfo{name=io.opentelemetry.sdk.trace, version=null, schemaUrl=null, attributes={}}, name=queueSize, description=The number of items queued, unit=1, type=LONG_GAUGE, data=ImmutableGaugeData{points=[ImmutableLongPointData{startEpochNanos=1751285851603000000, epochNanos=1751286571612000000, attributes={processorType="BatchSpanProcessor"}, value=244, exemplars=[]}]}}
[6/30/25, 12:30:31:615 GMT] 0000003d io.opentelemetry.exporter.logging.LoggingMetricExporter I metric: ImmutableMetricData{resource=Resource{schemaUrl=null, attributes={host.arch="s390x", host.name="WINMVS22", os.description="z/OS 02.05.00", os.type="z_os", process.runtime.description="IBM Corporation IBM J9 VM ibm-jdk21-zOS-21.0.4-f45de8e9eb0", process.runtime.name="IBM Semeru Runtime Certified Edition for z/OS", process.runtime.version="21.0.4+7", service.instance.id="b102f051-ad26-49eb-8c57-7fe42a8e395b", service.name="ZCPerf", telemetry.sdk.language="java", telemetry.sdk.name="opentelemetry", telemetry.sdk.version="1.39.0"}}, instrumentationScopeInfo=InstrumentationScopeInfo{name=io.opentelemetry.sdk.trace, version=null, schemaUrl=null, attributes={}}, name=processedSpans, description=The number of spans processed by the BatchSpanProcessor. [dropped=true if they were dropped due to high throughput], unit=1, type=LONG_SUM, data=ImmutableSumData{points=[ImmutableLongPointData{startEpochNanos=1751285851603000000, epochNanos=1751286631612000000, attributes={dropped=false, processorType="BatchSpanProcessor"}, value=727920, exemplars=[]}], monotonic=true, aggregationTemporality=CUMULATIVE}}
[6/30/25, 12:30:31:615 GMT] 0000003d io.opentelemetry.exporter.logging.LoggingMetricExporter I metric: ImmutableMetricData{resource=Resource{schemaUrl=null, attributes={host.arch="s390x", host.name="WINMVS22", os.description="z/OS 02.05.00", os.type="z_os", process.runtime.description="IBM Corporation IBM J9 VM ibm-jdk21-zOS-21.0.4-f45de8e9eb0", process.runtime.name="IBM Semeru Runtime Certified Edition for z/OS", process.runtime.version="21.0.4+7", service.instance.id="b102f051-ad26-49eb-8c57-7fe42a8e395b", service.name="ZCPerf", telemetry.sdk.language="java", telemetry.sdk.name="opentelemetry", telemetry.sdk.version="1.39.0"}}, instrumentationScopeInfo=InstrumentationScopeInfo{name=io.opentelemetry.sdk.trace, version=null, schemaUrl=null, attributes={}}, name=queueSize, description=The number of items queued, unit=1, type=LONG_GAUGE, data=ImmutableGaugeData{points=[ImmutableLongPointData{startEpochNanos=1751285851603000000, epochNanos=1751286631612000000, attributes={processorType="BatchSpanProcessor"}, value=1181, exemplars=[]}]}}
[6/30/25, 12:31:31:615 GMT] 0000003d io.opentelemetry.exporter.logging.LoggingMetricExporter I metric: ImmutableMetricData{resource=Resource{schemaUrl=null, attributes={host.arch="s390x", host.name="WINMVS22", os.description="z/OS 02.05.00", os.type="z_os", process.runtime.description="IBM Corporation IBM J9 VM ibm-jdk21-zOS-21.0.4-f45de8e9eb0", process.runtime.name="IBM Semeru Runtime Certified Edition for z/OS", process.runtime.version="21.0.4+7", service.instance.id="b102f051-ad26-49eb-8c57-7fe42a8e395b", service.name="ZCPerf", telemetry.sdk.language="java", telemetry.sdk.name="opentelemetry", telemetry.sdk.version="1.39.0"}}, instrumentationScopeInfo=InstrumentationScopeInfo{name=io.opentelemetry.sdk.trace, version=null, schemaUrl=null, attributes={}}, name=processedSpans, description=The number of spans processed by the BatchSpanProcessor. [dropped=true if they were dropped due to high throughput], unit=1, type=LONG_SUM, data=ImmutableSumData{points=[ImmutableLongPointData{startEpochNanos=1751285851603000000, epochNanos=1751286691612000000, attributes={dropped=false, processorType="BatchSpanProcessor"}, value=1110896, exemplars=[]}], monotonic=true, aggregationTemporality=CUMULATIVE}}
[6/30/25, 12:31:31:615 GMT] 0000003d io.opentelemetry.exporter.logging.LoggingMetricExporter I metric: ImmutableMetricData{resource=Resource{schemaUrl=null, attributes={host.arch="s390x", host.name="WINMVS22", os.description="z/OS 02.05.00", os.type="z_os", process.runtime.description="IBM Corporation IBM J9 VM ibm-jdk21-zOS-21.0.4-f45de8e9eb0", process.runtime.name="IBM Semeru Runtime Certified Edition for z/OS", process.runtime.version="21.0.4+7", service.instance.id="b102f051-ad26-49eb-8c57-7fe42a8e395b", service.name="ZCPerf", telemetry.sdk.language="java", telemetry.sdk.name="opentelemetry", telemetry.sdk.version="1.39.0"}}, instrumentationScopeInfo=InstrumentationScopeInfo{name=io.opentelemetry.sdk.trace, version=null, schemaUrl=null, attributes={}}, name=queueSize, description=The number of items queued, unit=1, type=LONG_GAUGE, data=ImmutableGaugeData{points=[ImmutableLongPointData{startEpochNanos=1751285851603000000, epochNanos=1751286691612000000, attributes={processorType="BatchSpanProcessor"}, value=632, exemplars=[]}]}}
// TPS has increased. Spans are dropped (dropped=true and count increasing)
[6/30/25, 12:32:31:617 GMT] 0000003d io.opentelemetry.exporter.logging.LoggingMetricExporter I metric: ImmutableMetricData{resource=Resource{schemaUrl=null, attributes={host.arch="s390x", host.name="WINMVS22", os.description="z/OS 02.05.00", os.type="z_os", process.runtime.description="IBM Corporation IBM J9 VM ibm-jdk21-zOS-21.0.4-f45de8e9eb0", process.runtime.name="IBM Semeru Runtime Certified Edition for z/OS", process.runtime.version="21.0.4+7", service.instance.id="b102f051-ad26-49eb-8c57-7fe42a8e395b", service.name="ZCPerf", telemetry.sdk.language="java", telemetry.sdk.name="opentelemetry", telemetry.sdk.version="1.39.0"}}, instrumentationScopeInfo=InstrumentationScopeInfo{name=io.opentelemetry.sdk.trace, version=null, schemaUrl=null, attributes={}}, name=processedSpans, description=The number of spans processed by the BatchSpanProcessor. [dropped=true if they were dropped due to high throughput], unit=1, type=LONG_SUM, data=ImmutableSumData{points=[ImmutableLongPointData{startEpochNanos=1751285851603000000, epochNanos=1751286751612000000, attributes={dropped=true, processorType="BatchSpanProcessor"}, value=29703, exemplars=[ImmutableLongExemplarData{filteredAttributes={}, epochNanos=1751286740188000000, spanContext=ImmutableSpanContext{traceId=af4de24242b7c84751903c8eb8f6ddfe, spanId=a454a9dd8ae8b1b5, traceFlags=01, traceState=ArrayBasedTraceState{entries=[]}, remote=false, valid=true}, value=1}, ImmutableLongExemplarData{filteredAttributes={}, epochNanos=1751286744356000000, spanContext=ImmutableSpanContext{traceId=4fdfb533097a6fa4c0165e38129fbffe, spanId=6108ed087b5be999, traceFlags=01, traceState=ArrayBasedTraceState{entries=[]}, remote=false, valid=true}, value=1}, ImmutableLongExemplarData{filteredAttributes={}, epochNanos=1751286747334000000, spanContext=ImmutableSpanContext{traceId=dbd05d5274902ee3da214458cfe61e91, spanId=e0a0d406be960ed1, traceFlags=01, traceState=ArrayBasedTraceState{entries=[]}, remote=false, valid=true}, value=1}, ImmutableLongExemplarData{filteredAttributes={}, epochNanos=1751286744956000000, spanContext=ImmutableSpanContext{traceId=06046f798e5c63e7957fc310dedbdf77, spanId=43394f3fbe67efae, traceFlags=01, traceState=ArrayBasedTraceState{entries=[]}, remote=false, valid=true}, value=1}, ImmutableLongExemplarData{filteredAttributes={}, epochNanos=1751286744974000000, spanContext=ImmutableSpanContext{traceId=ff3cbdbbc941987b38ba39300233a807, spanId=5f6473e4711324be, traceFlags=01, traceState=ArrayBasedTraceState{entries=[]}, remote=false, valid=true}, value=1}, ImmutableLongExemplarData{filteredAttributes={}, epochNanos=1751286750006000000, spanContext=ImmutableSpanContext{traceId=eec0c7b166d6de811f61dcbbc77c74a3, spanId=7429e749111343e4, traceFlags=01, traceState=ArrayBasedTraceState{entries=[]}, remote=false, valid=true}, value=1}]}, ImmutableLongPointData{startEpochNanos=1751285851603000000, epochNanos=1751286751612000000, attributes={dropped=false, processorType="BatchSpanProcessor"}, value=1512304, exemplars=[]}], monotonic=true, aggregationTemporality=CUMULATIVE}}
[6/30/25, 12:32:31:617 GMT] 0000003d io.opentelemetry.exporter.logging.LoggingMetricExporter I metric: ImmutableMetricData{resource=Resource{schemaUrl=null, attributes={host.arch="s390x", host.name="WINMVS22", os.description="z/OS 02.05.00", os.type="z_os", process.runtime.description="IBM Corporation IBM J9 VM ibm-jdk21-zOS-21.0.4-f45de8e9eb0", process.runtime.name="IBM Semeru Runtime Certified Edition for z/OS", process.runtime.version="21.0.4+7", service.instance.id="b102f051-ad26-49eb-8c57-7fe42a8e395b", service.name="ZCPerf", telemetry.sdk.language="java", telemetry.sdk.name="opentelemetry", telemetry.sdk.version="1.39.0"}}, instrumentationScopeInfo=InstrumentationScopeInfo{name=io.opentelemetry.sdk.trace, version=null, schemaUrl=null, attributes={}}, name=queueSize, description=The number of items queued, unit=1, type=LONG_GAUGE, data=ImmutableGaugeData{points=[ImmutableLongPointData{startEpochNanos=1751285851603000000, epochNanos=1751286751612000000, attributes={processorType="BatchSpanProcessor"}, value=14425, exemplars=[]}]}}
[6/30/25, 12:33:31:616 GMT] 0000003d io.opentelemetry.exporter.logging.LoggingMetricExporter I metric: ImmutableMetricData{resource=Resource{schemaUrl=null, attributes={host.arch="s390x", host.name="WINMVS22", os.description="z/OS 02.05.00", os.type="z_os", process.runtime.description="IBM Corporation IBM J9 VM ibm-jdk21-zOS-21.0.4-f45de8e9eb0", process.runtime.name="IBM Semeru Runtime Certified Edition for z/OS", process.runtime.version="21.0.4+7", service.instance.id="b102f051-ad26-49eb-8c57-7fe42a8e395b", service.name="ZCPerf", telemetry.sdk.language="java", telemetry.sdk.name="opentelemetry", telemetry.sdk.version="1.39.0"}}, instrumentationScopeInfo=InstrumentationScopeInfo{name=io.opentelemetry.sdk.trace, version=null, schemaUrl=null, attributes={}}, name=processedSpans, description=The number of spans processed by the BatchSpanProcessor. [dropped=true if they were dropped due to high throughput], unit=1, type=LONG_SUM, data=ImmutableSumData{points=[ImmutableLongPointData{startEpochNanos=1751285851603000000, epochNanos=1751286811612000000, attributes={dropped=true, processorType="BatchSpanProcessor"}, value=128432, exemplars=[ImmutableLongExemplarData{filteredAttributes={}, epochNanos=1751286752383000000, spanContext=ImmutableSpanContext{traceId=74dcf4ac46b800547e77795323be83e4, spanId=6588dc2fed00a0b2, traceFlags=01, traceState=ArrayBasedTraceState{entries=[]}, remote=false, valid=true}, value=1}, ImmutableLongExemplarData{filteredAttributes={}, epochNanos=1751286759903000000, spanContext=ImmutableSpanContext{traceId=193bc71a6952e26ec08119cb58cd0b49, spanId=2042273c6a36571f, traceFlags=01, traceState=ArrayBasedTraceState{entries=[]}, remote=false, valid=true}, value=1}, ImmutableLongExemplarData{filteredAttributes={}, epochNanos=1751286756998000000, spanContext=ImmutableSpanContext{traceId=3a229b76874f583173d183c7bfe5e3c0, spanId=bfe8ddb4fe5e9927, traceFlags=01, traceState=ArrayBasedTraceState{entries=[]}, remote=false, valid=true}, value=1}, ImmutableLongExemplarData{filteredAttributes={}, epochNanos=1751286773424000000, spanContext=ImmutableSpanContext{traceId=0125c4c5fb6087b46b0aef3e0ddd83f3, spanId=79bd0f13f80d67c2, traceFlags=01, traceState=ArrayBasedTraceState{entries=[]}, remote=false, valid=true}, value=1}, ImmutableLongExemplarData{filteredAttributes={}, epochNanos=1751286760263000000, spanContext=ImmutableSpanContext{traceId=38989c2f71d596d430e21a06d11e1ca4, spanId=64c860cbddb4ea5a, traceFlags=01, traceState=ArrayBasedTraceState{entries=[]}, remote=false, valid=true}, value=1}, ImmutableLongExemplarData{filteredAttributes={}, epochNanos=1751286811551000000, spanContext=ImmutableSpanContext{traceId=451c7614da64096faf80a6cd7296219a, spanId=a02d04cfe3058755, traceFlags=01, traceState=ArrayBasedTraceState{entries=[]}, remote=false, valid=true}, value=1}]}, ImmutableLongPointData{startEpochNanos=1751285851603000000, epochNanos=1751286811612000000, attributes={dropped=false, processorType="BatchSpanProcessor"}, value=1940336, exemplars=[]}], monotonic=true, aggregationTemporality=CUMULATIVE}}
[6/30/25, 12:33:31:616 GMT] 0000003d io.opentelemetry.exporter.logging.LoggingMetricExporter I metric: ImmutableMetricData{resource=Resource{schemaUrl=null, attributes={host.arch="s390x", host.name="WINMVS22", os.description="z/OS 02.05.00", os.type="z_os", process.runtime.description="IBM Corporation IBM J9 VM ibm-jdk21-zOS-21.0.4-f45de8e9eb0", process.runtime.name="IBM Semeru Runtime Certified Edition for z/OS", process.runtime.version="21.0.4+7", service.instance.id="b102f051-ad26-49eb-8c57-7fe42a8e395b", service.name="ZCPerf", telemetry.sdk.language="java", telemetry.sdk.name="opentelemetry", telemetry.sdk.version="1.39.0"}}, instrumentationScopeInfo=InstrumentationScopeInfo{name=io.opentelemetry.sdk.trace, version=null, schemaUrl=null, attributes={}}, name=queueSize, description=The number of items queued, unit=1, type=LONG_GAUGE, data=ImmutableGaugeData{points=[ImmutableLongPointData{startEpochNanos=1751285851603000000, epochNanos=1751286811612000000, attributes={processorType="BatchSpanProcessor"}, value=14357, exemplars=[]}]}}
[6/30/25, 12:34:31:615 GMT] 0000003d io.opentelemetry.exporter.logging.LoggingMetricExporter I metric: ImmutableMetricData{resource=Resource{schemaUrl=null, attributes={host.arch="s390x", host.name="WINMVS22", os.description="z/OS 02.05.00", os.type="z_os", process.runtime.description="IBM Corporation IBM J9 VM ibm-jdk21-zOS-21.0.4-f45de8e9eb0", process.runtime.name="IBM Semeru Runtime Certified Edition for z/OS", process.runtime.version="21.0.4+7", service.instance.id="b102f051-ad26-49eb-8c57-7fe42a8e395b", service.name="ZCPerf", telemetry.sdk.language="java", telemetry.sdk.name="opentelemetry", telemetry.sdk.version="1.39.0"}}, instrumentationScopeInfo=InstrumentationScopeInfo{name=io.opentelemetry.sdk.trace, version=null, schemaUrl=null, attributes={}}, name=processedSpans, description=The number of spans processed by the BatchSpanProcessor. [dropped=true if they were dropped due to high throughput], unit=1, type=LONG_SUM, data=ImmutableSumData{points=[ImmutableLongPointData{startEpochNanos=1751285851603000000, epochNanos=1751286871612000000, attributes={dropped=true, processorType="BatchSpanProcessor"}, value=196064, exemplars=[ImmutableLongExemplarData{filteredAttributes={}, epochNanos=1751286813607000000, spanContext=ImmutableSpanContext{traceId=a41a3d9bd4e09a7e9564592f6c7a17cf, spanId=255703a0b4826de3, traceFlags=01, traceState=ArrayBasedTraceState{entries=[]}, remote=false, valid=true}, value=1}, ImmutableLongExemplarData{filteredAttributes={}, epochNanos=1751286812706000000, spanContext=ImmutableSpanContext{traceId=219f877611e618f8f0397aa4d9524a33, spanId=106e8763fee56a82, traceFlags=01, traceState=ArrayBasedTraceState{entries=[]}, remote=false, valid=true}, value=1}, ImmutableLongExemplarData{filteredAttributes={}, epochNanos=1751286841245000000, spanContext=ImmutableSpanContext{traceId=f59bfb09953f725bbe286aad71c34a89, spanId=758ab2dda8bae1b9, traceFlags=01, traceState=ArrayBasedTraceState{entries=[]}, remote=false, valid=true}, value=1}, ImmutableLongExemplarData{filteredAttributes={}, epochNanos=1751286837669000000, spanContext=ImmutableSpanContext{traceId=a0c234a9c5bc2a2fd0da13886aeb62d4, spanId=73edfda08eced5c2, traceFlags=01, traceState=ArrayBasedTraceState{entries=[]}, remote=false, valid=true}, value=1}, ImmutableLongExemplarData{filteredAttributes={}, epochNanos=1751286820306000000, spanContext=ImmutableSpanContext{traceId=851242a63af3f5f1efbb46edc298aeb8, spanId=7dfe95deb6033ace, traceFlags=01, traceState=ArrayBasedTraceState{entries=[]}, remote=false, valid=true}, value=1}, ImmutableLongExemplarData{filteredAttributes={}, epochNanos=1751286841302000000, spanContext=ImmutableSpanContext{traceId=a17c084a4030670886353678277b7b8e, spanId=112e666a2505b5a4, traceFlags=01, traceState=ArrayBasedTraceState{entries=[]}, remote=false, valid=true}, value=1}]}, ImmutableLongPointData{startEpochNanos=1751285851603000000, epochNanos=1751286871612000000, attributes={dropped=false, processorType="BatchSpanProcessor"}, value=2292592, exemplars=[]}], monotonic=true, aggregationTemporality=CUMULATIVE}}
[6/30/25, 12:34:31:615 GMT] 0000003d io.opentelemetry.exporter.logging.LoggingMetricExporter I metric: ImmutableMetricData{resource=Resource{schemaUrl=null, attributes={host.arch="s390x", host.name="WINMVS22", os.description="z/OS 02.05.00", os.type="z_os", process.runtime.description="IBM Corporation IBM J9 VM ibm-jdk21-zOS-21.0.4-f45de8e9eb0", process.runtime.name="IBM Semeru Runtime Certified Edition for z/OS", process.runtime.version="21.0.4+7", service.instance.id="b102f051-ad26-49eb-8c57-7fe42a8e395b", service.name="ZCPerf", telemetry.sdk.language="java", telemetry.sdk.name="opentelemetry", telemetry.sdk.version="1.39.0"}}, instrumentationScopeInfo=InstrumentationScopeInfo{name=io.opentelemetry.sdk.trace, version=null, schemaUrl=null, attributes={}}, name=queueSize, description=The number of items queued, unit=1, type=LONG_GAUGE, data=ImmutableGaugeData{points=[ImmutableLongPointData{startEpochNanos=1751285851603000000, epochNanos=1751286871612000000, attributes={processorType="BatchSpanProcessor"}, value=988, exemplars=[]}]}}
// TPS has decreased. Spans are no longer being dropped (dropped=true and count is not increasing)
[6/30/25, 12:35:31:615 GMT] 0000003d io.opentelemetry.exporter.logging.LoggingMetricExporter I metric: ImmutableMetricData{resource=Resource{schemaUrl=null, attributes={host.arch="s390x", host.name="WINMVS22", os.description="z/OS 02.05.00", os.type="z_os", process.runtime.description="IBM Corporation IBM J9 VM ibm-jdk21-zOS-21.0.4-f45de8e9eb0", process.runtime.name="IBM Semeru Runtime Certified Edition for z/OS", process.runtime.version="21.0.4+7", service.instance.id="b102f051-ad26-49eb-8c57-7fe42a8e395b", service.name="ZCPerf", telemetry.sdk.language="java", telemetry.sdk.name="opentelemetry", telemetry.sdk.version="1.39.0"}}, instrumentationScopeInfo=InstrumentationScopeInfo{name=io.opentelemetry.sdk.trace, version=null, schemaUrl=null, attributes={}}, name=processedSpans, description=The number of spans processed by the BatchSpanProcessor. [dropped=true if they were dropped due to high throughput], unit=1, type=LONG_SUM, data=ImmutableSumData{points=[ImmutableLongPointData{startEpochNanos=1751285851603000000, epochNanos=1751286931612000000, attributes={dropped=true, processorType="BatchSpanProcessor"}, value=196064, exemplars=[]}, ImmutableLongPointData{startEpochNanos=1751285851603000000, epochNanos=1751286931612000000, attributes={dropped=false, processorType="BatchSpanProcessor"}, value=2435952, exemplars=[]}], monotonic=true, aggregationTemporality=CUMULATIVE}}
[6/30/25, 12:35:31:615 GMT] 0000003d io.opentelemetry.exporter.logging.LoggingMetricExporter I metric: ImmutableMetricData{resource=Resource{schemaUrl=null, attributes={host.arch="s390x", host.name="WINMVS22", os.description="z/OS 02.05.00", os.type="z_os", process.runtime.description="IBM Corporation IBM J9 VM ibm-jdk21-zOS-21.0.4-f45de8e9eb0", process.runtime.name="IBM Semeru Runtime Certified Edition for z/OS", process.runtime.version="21.0.4+7", service.instance.id="b102f051-ad26-49eb-8c57-7fe42a8e395b", service.name="ZCPerf", telemetry.sdk.language="java", telemetry.sdk.name="opentelemetry", telemetry.sdk.version="1.39.0"}}, instrumentationScopeInfo=InstrumentationScopeInfo{name=io.opentelemetry.sdk.trace, version=null, schemaUrl=null, attributes={}}, name=queueSize, description=The number of items queued, unit=1, type=LONG_GAUGE, data=ImmutableGaugeData{points=[ImmutableLongPointData{startEpochNanos=1751285851603000000, epochNanos=1751286931612000000, attributes={processorType="BatchSpanProcessor"}, value=780, exemplars=[]}]}}
[6/30/25, 12:36:31:615 GMT] 0000003d io.opentelemetry.exporter.logging.LoggingMetricExporter I metric: ImmutableMetricData{resource=Resource{schemaUrl=null, attributes={host.arch="s390x", host.name="WINMVS22", os.description="z/OS 02.05.00", os.type="z_os", process.runtime.description="IBM Corporation IBM J9 VM ibm-jdk21-zOS-21.0.4-f45de8e9eb0", process.runtime.name="IBM Semeru Runtime Certified Edition for z/OS", process.runtime.version="21.0.4+7", service.instance.id="b102f051-ad26-49eb-8c57-7fe42a8e395b", service.name="ZCPerf", telemetry.sdk.language="java", telemetry.sdk.name="opentelemetry", telemetry.sdk.version="1.39.0"}}, instrumentationScopeInfo=InstrumentationScopeInfo{name=io.opentelemetry.sdk.trace, version=null, schemaUrl=null, attributes={}}, name=processedSpans, description=The number of spans processed by the BatchSpanProcessor. [dropped=true if they were dropped due to high throughput], unit=1, type=LONG_SUM, data=ImmutableSumData{points=[ImmutableLongPointData{startEpochNanos=1751285851603000000, epochNanos=1751286991612000000, attributes={dropped=true, processorType="BatchSpanProcessor"}, value=196064, exemplars=[]}, ImmutableLongPointData{startEpochNanos=1751285851603000000, epochNanos=1751286991612000000, attributes={dropped=false, processorType="BatchSpanProcessor"}, value=2579312, exemplars=[]}], monotonic=true, aggregationTemporality=CUMULATIVE}}
[6/30/25, 12:36:31:615 GMT] 0000003d io.opentelemetry.exporter.logging.LoggingMetricExporter I metric: ImmutableMetricData{resource=Resource{schemaUrl=null, attributes={host.arch="s390x", host.name="WINMVS22", os.description="z/OS 02.05.00", os.type="z_os", process.runtime.description="IBM Corporation IBM J9 VM ibm-jdk21-zOS-21.0.4-f45de8e9eb0", process.runtime.name="IBM Semeru Runtime Certified Edition for z/OS", process.runtime.version="21.0.4+7", service.instance.id="b102f051-ad26-49eb-8c57-7fe42a8e395b", service.name="ZCPerf", telemetry.sdk.language="java", telemetry.sdk.name="opentelemetry", telemetry.sdk.version="1.39.0"}}, instrumentationScopeInfo=InstrumentationScopeInfo{name=io.opentelemetry.sdk.trace, version=null, schemaUrl=null, attributes={}}, name=queueSize, description=The number of items queued, unit=1, type=LONG_GAUGE, data=ImmutableGaugeData{points=[ImmutableLongPointData{startEpochNanos=1751285851603000000, epochNanos=1751286991612000000, attributes={processorType="BatchSpanProcessor"}, value=92, exemplars=[]}]}}
- A consistent workload (TPS) is used for the first 4 minutes (12:27-12:30).
- During the minute 12:31, the TPS is increased which is visible by a large increase in the exported spans count compared to the first 4 minutes.
- At minute 12:32, the TPS is increased further and spans cannot be exported fast enough. As a result, the max queue size of 10,000 is exceeded and the dropped spans count is increasing.
- At minute 12:33, the dropped spans count is still increasing.
- During the minute 12:34, the TPS is decreased and as a result the queue size is less than the max queue size. The dropped spans count has increased as part of this minute interval was at higher TPS. The smaller queue size suggests this TPS is OK for export of spans.
- At minutes 12:35 and 12:36, the dropped spans count does not change. The TPS is at a rate that is suitable for export of all spans.
| Time | Exported spans count | Queue size | Dropped spans count |
|---|---|---|---|
| 12:27:31:618 | 447344 | 556 | 0 |
| 12:28:31:615 | 494448 | 1284 | 0 |
| 12:29:31:615 | 543600 | 244 | 0 |
| 12:30:31:615 | 727920 | 1181 | 0 |
| 12:31:31:615 | 1110896 | 632 | 0 |
| 12:32:31:617 | 1512304 | 14425 | 29703 (Note 1.) |
| 12:33:31:616 | 1940336 | 14357 | 128432 |
| 12:34:31:615 | 2292592 | 988 | 196064 |
| 12:35:31:615 | 2435952 | 780 | 196064 (Note 2.) |
| 12:36:31:615 | 2579312 | 92 | 196064 |
- Once spans start to be dropped, the
droppedSpanscount is inserted before the exported spans number. - The TPS rate is no longer higher than the span export rate and spans
are no longer being dropped. The
droppedSpanscount is not increasing.
To determine the TPS at which your server starts to drop spans, you need to run a consistent workload at a known TPS.
- Within each metrics' output, there are two entries that use
InstrumentationScopeInfo{name=io.opentelemetry.sdk.trace. One of these lines hasname=queueSizeand the othername=processedSpans. - The line with
name=queueSizereports the number of items queued in thevalue=<number>field. - The line with
name=processedSpansreports if spans are dropped, the number of dropped spans, and the number of exported spans. - Let the server run for several minutes at a consistent TPS. Then
refresh the
messages.logand check thatdropped=falseis still present on eachname=processedSpansline. This confirms that all spans are being exported and none are being dropped for this TPS. - Now, increase the TPS and repeat step 4 until you start to see
dropped=true. This confirms that the current TPS is too high for the number of spans. - Upon finding a TPS that drops spans, you must assess the needs of your environment; whether the
TPS is suitable for your maximum expected workload in production. If the TPS is not yet high enough,
you need to increase one or both of
otel.bsp.max.export.batch.sizeandotel.bsp.max.queue.size. It is recommended to change only one variable at a time and repeat the test. Increasing the queue size will only delay the time before spans are dropped, but a higher value can help with the peaks in workload that are short term.
It is recommended that you ensure your server can handle a TPS without dropping spans that is at least 25% greater than your current expected maximum production TPS. This allows for future growth of your environment.