Optimize data ingest in Instana

Edit online

In modern observability practices, managing data ingestion efficiently is essential for maintaining application performance and infrastructure reliability. You can optimize data ingestion in Instana across various monitoring sources—including tracing, infrastructure, end-user monitoring, synthetics, logging, and serverless environments. This strategy provides actionable techniques, such as instrumentation filtering, sensor configuration, polling rate adjustments, and selective monitoring to reduce noise, improve performance, and maintain observability efficiency.

Manage data ingested from tracing

Edit online

Efficient trace data management begins with thoughtful instrumentation. By selectively enabling, disabling, or filtering spans, teams can reduce ingestion volume, minimize noise, and retain only relevant traces—while maintaining visibility into essential workflows.

Instrumentation configuration and filtering

Edit online

You can use any of the following techniques to configure and filter tracing data.

Capture stack traces only for erroneous spans

Capturing stack traces only for erroneous spans is a recommended first optimization step, as it significantly reduces trace payload size and overall data ingestion volume while still preserving critical diagnostic information for failed spans. This approach also improves the signal-to-noise ratio by avoiding unnecessary stack traces for successful spans, making troubleshooting more efficient and focused.

For language-specific configuration details, see the following resources:
- Java
- Ruby
- Python
- Node.js
Disable stack trace capture

If limiting stack trace collection to erroneous spans does not sufficiently reduce data volume, stack trace capture can be disabled more broadly. Disabling stack trace collection for exit and intermediate spans further reduces trace payload size and data ingestion volume, may lower performance overhead, and is particularly useful for high-throughput applications or services with deep call stacks.

For language-specific configuration details, see the following resources:
- Java
- Ruby
- Python
- Node.js
Disable tracing by technology

You can completely disable tracing for technologies, such as Redis, DynamoDB, and Logging to prevent span creation and reduce data ingestion. The following example illustrates how to disable tracing for any technology:
```
com.instana.plugin.javatrace:
 instrumentation:
   plugins:
     Redis: false
     DynamoDB: false
     Logging: false
 
```
Disable tracing by library

You can disable specific instrumentation libraries, such as Spring Couchbase to prevent span generation from sources and helps reduce data ingestion. The following example demonstrates how to configure this function:
```
com.instana.plugin.javatrace:
 instrumentation:
   plugins:
     CouchbaseExit: false
     Couchbase3Exit: false
 
```
Note: Restart the Java application to make sure that the instrumentation is disabled.

Endpoint filtering

Instana visualizes service interactions as endpoints, and filtering them helps suppress low-value operations and highlight business-critical traces. This reduces dashboard noise and allows precise control over which commands or actions are monitored. For more information, see Ignoring endpoints for host agents and Ignoring endpoints for Node.js.

The following table lists the tracers and packages that support ignoring endpoints:


Supported packages	Node.js	Java	Go	PHP	Python	Ruby	.NET	NGINX
Redis	✅	✅	❌	❌	✅	❌	❌	❌
DynamoDB	✅	✅	❌	❌	✅	❌	❌	❌
Kafka	✅	✅	❌	❌	✅	❌	❌	❌
HTTP	✅	✅	❌	❌	❌	❌	❌	❌
Java Database Connectivity (JDBC)	❌	✅	❌	❌	❌	❌	❌	❌

Example: Configuring endpoint exclusion

To configure endpoints to be ignored, you must specify the endpoints that need to be excluded from monitoring in the com.instana.tracing.ignore-endpoints section in your agent configuration file as shown in the following example:

com.instana.tracing:
  ignore-endpoints:

    # Filtering by Method Name
    redis:
      - 'get'
      - 'type'
    dynamodb:
      - 'query'
    kafka:
      - 'send'

   # Filtering by Method Name and Endpoint for Kafka

    kafka:
      - methods: ["consume"]
        endpoints: ["topic1", "topic2"] # Exclude consume calls for topic1 and topic2

      - methods: ["consume", "send"]
        endpoints: ["topic3"] # Exclude both consume and send calls for topic3

      - methods: ["*"]
        endpoints: ["topic4"] # Exclude all methods for topic4

      - methods: ["consume"]
        endpoints: ["*"] # Exclude consume method for all topics

For more information about applying Fair Use Policy (FUP) principles across different runtimes and libraries, see the following blogs:

Span attributes-based filtering for Java applications

You can reduce trace data ingestion for Java applications by filtering out specific spans based on their attributes by using the Java Tracer's span filtering feature. This capability helps optimize data ingestion costs and focus on the most relevant traces for your application monitoring needs.

For Java applications, the Java tracer supports filtering for HTTP and JDBC spans. You can configure filtering rules through the agent configuration.yaml file to exclude spans based on attributes, such as:
- HTTP filtering: URLs, HTTP methods, status codes, headers, query parameters, and error messages
- JDBC filtering: SQL statements, database connection strings, and error messages
Example: Java Tracer HTTP and JDBC span filtering configuration
```
com.instana.tracing:
  filter:
    exclude:
      - name: exclude HTTP health check endpoints
        attributes:
          - key: http.url
            values: [/health, /ping, /ready]
            match_type: endswith
      - name: exclude JDBC queries for audit tables
        attributes:
          - key: jdbc.statement
            values: [audit_log, session_data]
            match_type: contains
```
When HTTP spans are excluded, all child spans and downstream tracing are automatically suppressed. JDBC span filtering excludes only the database spans without suppressing downstream operations.

For detailed information about configuring Java span filtering, see Configuring filtering.

For more information about span attributes-based filtering for Java applications, see the following blog:
- Optimizing JDBC and HC observability: Filtering out low-priority traces with Instana
Change host agent mode

Switching to Infrastructure mode disables all tracers and helps optimize data ingestion when necessary. For more information, see Changing agent modes.
Use of X-INSTANA-S header

You can include the X-INSTANA-S header in downstream application flows to selectively suppress specific traces. This configuration enables fine-grained control over which calls are captured, though it currently requires programmatic implementation.
Disable automatic trace instrumentation

For Java applications, you can disable tracing for a specific process by using process command line options or environment variables, allowing for targeted control over instrumentation. For more information, see Disabling Java Trace instrumentation.

Note: In addition, at MVS level, you can run ignore process to reduce data ingest.

Manage data ingested from infrastructure

Edit online

By default, Instana enables all sensors through automatic discovery, but you can disable them as needed by using the configuration file. Most sensors allow configurable polling rates, except for Azure sensors. Larger sensors also support filtering capabilities to limit which entities are monitored, helping reduce unnecessary data ingest. In addition, you can adjust polling intervals in the configuration file to fine-tune data collection and optimize performance. The following example outlines the configuration of the ibmmq plugin section in the agent configuration file:

com.instana.plugin.ibmmq:
  enabled: true
  poll_rate: 75

For more information, see Configuring the polling rate.

Sensors contributing to high data ingestion

Edit online

The following table explains the reason for high ingestion for certain types of sensors:


Sensor type	Examples	Reason for high ingestion
Messaging system sensors	IBM MQ, TibcoEMS	Generates substantial data due to their central role in environments and the large number of entities (queues, topics, etc.) they monitor.
Platform sensors	Kubernetes, vSphere, PCF	Frequent updates and detailed metrics contribute to significant data ingestion.
Prometheus sensors	Prometheus	Configurable polling intervals allow ingestion tuning. For more information, see Configuration for Kubernetes environments.

Reasons for high data ingestion from certain sensors

Edit online

Messaging systems typically handle large volumes of transactions and entities, which naturally results in substantial data generation. Similarly, sensors that monitor a wide range of entities and diverse metrics contribute to increased ingestion. To mitigate this, some sensors support the use of regular expressions in configuration files, allowing teams to filter and limit the number of entities being monitored.

Optimization strategies

Edit online

Entity filtering

You can use regular expressions in configuration files to monitor only a subset of entities.
Polling rate adjustment

For sensors that support configurable polling intervals, you can adjust the frequency to reduce data volume. For more information about polling interval configuration, see the following configuration sections:
- For Kubernetes
- For Prometheus

Cloud managed services (AWS, Azure, GCP)

You can use cloud tags to selectively include or exclude managed service instances from monitoring. The following example demonstrates how to use cloud tags:

com.instana.plugin.aws.rds:
 include_tags: # Comma separated list of tags in key:value format (For example, env:prod,env:staging)

com.instana.plugin.aws.rds:
 exclude_tags: # Comma separated list of tags in key:value format (For example, env:dev,env:test)

Selective monitoring

Instana supports opt-in and opt-out mechanisms to control which processes and containers are monitored. By default, the agent operates in opt-out mode. When INSTANA_SELECTIVE_MONITORING is not set or is set to OPT_OUT, the agent monitors all processes unless you explicitly exclude them.

Process and container-level control

You can use configuration settings to include or exclude specific processes or containers from monitoring. For more information, see Instana selective monitoring.


Monitoring mode	Description	Agent environment	Process environment	Result
Opt-out	All processes monitored by default	Not set or`INSTANA_SELECTIVE_MONITORING=OPT_OUT`	Not set	The process is monitored
Opt-out	All processes monitored by default	Not set or`INSTANA_SELECTIVE_MONITORING=OPT_OUT`	`INSTANA_MONITORING=false`	The process is ignored
Opt-in	No process monitored by default	`INSTANA_SELECTIVE_MONITORING=OPT_IN`	Not set	The process is ignored
Opt-in	No process monitored by default	`INSTANA_SELECTIVE_MONITORING=OPT_IN`	`INSTANA_MONITORING=true`	The process is monitored

Kubernetes namespace label for namespace opt-out and opt-in

Instana allows you to control workload monitoring at the Kubernetes namespace level by using the instana-workload-monitoring label. This label determines whether a namespace is included or excluded from monitoring:

instana-workload-monitoring=false → Opt-out: Monitoring is disabled for all workloads in the namespace.
instana-workload-monitoring=true → Opt-in: Monitoring is enabled for workloads in the namespace.

Manage data ingest from End User Monitoring (EUM)

Edit online

There are predefined limits to the amount of data collected by browser tab (these are the Instana defaults, and you cannot change them). To manage data ingest from EUM, you can decide whether to enable or disable some of the following optional features:

Custom events. For example, disable automatic conversion of User‑Timing API markers to custom events.
View or page transitions
User information
Crashes

Manage data ingest from synthetics

Edit online

To minimize data ingestion from synthetic tests in Instana, consider the following approaches:

Disable video recording in browser tests, which is the default setting
Limit screenshot capture, as screenshots are typically taken only when a test fails
Increase in test frequency by decreasing the number of executed tests by increase in interval size (known as frequency of scheduling (in minutes) in Instana)
Stop test execution when synthetic monitoring is not required

Manage data ingest from logging sources

Edit online

Instana provides options to control data ingestion from logging sources:

OpenTelemetry logs

You can manage log volume by limiting the number of OpenTelemetry sources contributing to logging. This helps reduce unnecessary data ingest and improves overall efficiency.

Trace logs

Instana tracers provide the flexibility to disable logging instrumentation across supported technologies. The following configuration is applied at the tracer level for Java, Node.js, Go, PHP, and Ruby.

Manage data ingested from serverless monitoring in Instana

Edit online

Instana currently offers limited support for serverless technologies, with primary focus on AWS Lambda functions. To optimize data ingestion and manage associated costs effectively, you can consider the following strategies:

Manage lambda function versions

By default, Instana monitors the five most recent versions of each Lambda function. You can reduce this number to lower monitoring costs. Instana supports monitoring of up to 20 versions, but fewer versions typically mean reduced data volume and cost. For more information, see Disabling retrieval of Lambda versions and metrics.

Adjusting polling intervals

The default polling interval for Lambda metrics is 5 minutes (300 seconds). Increasing this interval reduces the frequency of data collection, which can help in minimizing ingestion costs. For more information, see Changing poll rate.

Conclusion

Edit online

For users seeking full control over instrumentation, Instana offers the flexibility to use OpenTelemetry instead of its native instrumentation. This enables more customizable collection and management of observability data. For more information, see OpenTelemetry.

The strategies help organizations optimize data ingest in Instana, improving efficiency while maintaining a strong focus on observability. This leads to a more streamlined and effective approach to managing observability data.

Optimize data ingest in Instana

Manage data ingested from tracing

Instrumentation configuration and filtering

Manage data ingested from infrastructure

Sensors contributing to high data ingestion

Reasons for high data ingestion from certain sensors

Optimization strategies

Manage data ingest from End User Monitoring (EUM)

Manage data ingest from synthetics

Manage data ingest from logging sources

Manage data ingested from serverless monitoring in Instana

Conclusion

Related information