Optimize data ingest in Instana

In modern observability practices, managing data ingestion efficiently is essential for maintaining application performance and infrastructure reliability. You can optimize data ingestion in Instana across various monitoring sources—including tracing, infrastructure, end-user monitoring, synthetics, logging, and serverless environments. This strategy provides actionable techniques, such as instrumentation filtering, sensor configuration, polling rate adjustments, and selective monitoring to reduce noise, improve performance, and maintain observability efficiency.

Manage data ingested from tracing

Efficient trace data management begins with thoughtful instrumentation. By selectively enabling, disabling, or filtering spans, teams can reduce ingestion volume, minimize noise, and retain only relevant traces—while maintaining visibility into essential workflows.

Instrumentation configuration and filtering

You can use any of the following techniques to configure and filter tracing data.

  • Capture stack traces only for erroneous spans

    Capturing stack traces only for erroneous spans is a recommended first optimization step, as it significantly reduces trace payload size and overall data ingestion volume while still preserving critical diagnostic information for failed spans. This approach also improves the signal-to-noise ratio by avoiding unnecessary stack traces for successful spans, making troubleshooting more efficient and focused.

    For language-specific configuration details, see the following resources:

  • Disable stack trace capture

    If limiting stack trace collection to erroneous spans does not sufficiently reduce data volume, stack trace capture can be disabled more broadly. Disabling stack trace collection for exit and intermediate spans further reduces trace payload size and data ingestion volume, may lower performance overhead, and is particularly useful for high-throughput applications or services with deep call stacks.

    For language-specific configuration details, see the following resources:

  • Disable tracing by technology

    You can completely disable tracing for technologies, such as Redis, DynamoDB, and Logging to prevent span creation and reduce data ingestion. The following example illustrates how to disable tracing for any technology:

    com.instana.plugin.javatrace:
     instrumentation:
       plugins:
         Redis: false
         DynamoDB: false
         Logging: false
     
  • Disable tracing by library

    You can disable specific instrumentation libraries, such as Spring Couchbase to prevent span generation from sources and helps reduce data ingestion. The following example demonstrates how to configure this function:

    com.instana.plugin.javatrace:
     instrumentation:
       plugins:
         CouchbaseExit: false
         Couchbase3Exit: false
     
    Note: Restart the Java application to make sure that the instrumentation is disabled.
  • Endpoint filtering

    Instana visualizes service interactions as endpoints, and filtering them helps suppress low-value operations and highlight business-critical traces. This reduces dashboard noise and allows precise control over which commands or actions are monitored. For more information, see Ignoring endpoints for host agents and Ignoring endpoints for Node.js.

    The following table lists the tracers and packages that support ignoring endpoints:

    Supported packages Node.js Java Go PHP Python Ruby .NET NGINX
    Redis
    DynamoDB
    Kafka
    HTTP
    Java Database Connectivity (JDBC)

    Example: Configuring endpoint exclusion

    To configure endpoints to be ignored, you must specify the endpoints that need to be excluded from monitoring in the com.instana.tracing.ignore-endpoints section in your agent configuration file as shown in the following example:

    com.instana.tracing:
      ignore-endpoints:
    
        # Filtering by Method Name
        redis:
          - 'get'
          - 'type'
        dynamodb:
          - 'query'
        kafka:
          - 'send'
    
       # Filtering by Method Name and Endpoint for Kafka
    
        kafka:
          - methods: ["consume"]
            endpoints: ["topic1", "topic2"] # Exclude consume calls for topic1 and topic2
    
          - methods: ["consume", "send"]
            endpoints: ["topic3"] # Exclude both consume and send calls for topic3
    
          - methods: ["*"]
            endpoints: ["topic4"] # Exclude all methods for topic4
    
          - methods: ["consume"]
            endpoints: ["*"] # Exclude consume method for all topics
    
     

    For more information about applying Fair Use Policy (FUP) principles across different runtimes and libraries, see the following blogs:

  • Span attributes-based filtering for Java applications

    You can reduce trace data ingestion for Java applications by filtering out specific spans based on their attributes by using the Java Tracer's span filtering feature. This capability helps optimize data ingestion costs and focus on the most relevant traces for your application monitoring needs.

    For Java applications, the Java tracer supports filtering for HTTP and JDBC spans. You can configure filtering rules through the agent configuration.yaml file to exclude spans based on attributes, such as:

    • HTTP filtering: URLs, HTTP methods, status codes, headers, query parameters, and error messages
    • JDBC filtering: SQL statements, database connection strings, and error messages

    Example: Java Tracer HTTP and JDBC span filtering configuration

    com.instana.tracing:
      filter:
        exclude:
          - name: exclude HTTP health check endpoints
            attributes:
              - key: http.url
                values: [/health, /ping, /ready]
                match_type: endswith
          - name: exclude JDBC queries for audit tables
            attributes:
              - key: jdbc.statement
                values: [audit_log, session_data]
                match_type: contains

    When HTTP spans are excluded, all child spans and downstream tracing are automatically suppressed. JDBC span filtering excludes only the database spans without suppressing downstream operations.

    For detailed information about configuring Java span filtering, see Configuring filtering.

    For more information about span attributes-based filtering for Java applications, see the following blog:

  • Change host agent mode

    Switching to Infrastructure mode disables all tracers and helps optimize data ingestion when necessary. For more information, see Changing agent modes.

  • Use of X-INSTANA-S header

    You can include the X-INSTANA-S header in downstream application flows to selectively suppress specific traces. This configuration enables fine-grained control over which calls are captured, though it currently requires programmatic implementation.

  • Disable automatic trace instrumentation

    For Java applications, you can disable tracing for a specific process by using process command line options or environment variables, allowing for targeted control over instrumentation. For more information, see Disabling Java Trace instrumentation.

Note: In addition, at MVS level, you can run ignore process to reduce data ingest.

Manage data ingested from infrastructure

By default, Instana enables all sensors through automatic discovery, but you can disable them as needed by using the configuration file. Most sensors allow configurable polling rates, except for Azure sensors. Larger sensors also support filtering capabilities to limit which entities are monitored, helping reduce unnecessary data ingest. In addition, you can adjust polling intervals in the configuration file to fine-tune data collection and optimize performance. The following example outlines the configuration of the ibmmq plugin section in the agent configuration file:

com.instana.plugin.ibmmq:
  enabled: true
  poll_rate: 75
 

For more information, see Configuring the polling rate.

Sensors contributing to high data ingestion

The following table explains the reason for high ingestion for certain types of sensors:

Sensor type Examples Reason for high ingestion
Messaging system sensors IBM MQ, TibcoEMS Generates substantial data due to their central role in environments and the large number of entities (queues, topics, etc.) they monitor.
Platform sensors Kubernetes, vSphere, PCF Frequent updates and detailed metrics contribute to significant data ingestion.
Prometheus sensors Prometheus Configurable polling intervals allow ingestion tuning. For more information, see Configuration for Kubernetes environments.

Reasons for high data ingestion from certain sensors

Messaging systems typically handle large volumes of transactions and entities, which naturally results in substantial data generation. Similarly, sensors that monitor a wide range of entities and diverse metrics contribute to increased ingestion. To mitigate this, some sensors support the use of regular expressions in configuration files, allowing teams to filter and limit the number of entities being monitored.

Optimization strategies

  • Entity filtering

    You can use regular expressions in configuration files to monitor only a subset of entities.

  • Polling rate adjustment

    For sensors that support configurable polling intervals, you can adjust the frequency to reduce data volume. For more information about polling interval configuration, see the following configuration sections:

  • Cloud managed services (AWS, Azure, GCP)

    You can use cloud tags to selectively include or exclude managed service instances from monitoring. The following example demonstrates how to use cloud tags:

    com.instana.plugin.aws.rds:
     include_tags: # Comma separated list of tags in key:value format (For example, env:prod,env:staging)
     
    com.instana.plugin.aws.rds:
     exclude_tags: # Comma separated list of tags in key:value format (For example, env:dev,env:test)
     
  • Selective monitoring

    Instana supports opt-in and opt-out mechanisms to control which processes and containers are monitored. By default, the agent operates in opt-out mode. When INSTANA_SELECTIVE_MONITORING is not set or is set to OPT_OUT, the agent monitors all processes unless you explicitly exclude them.

    • Process and container-level control

    You can use configuration settings to include or exclude specific processes or containers from monitoring. For more information, see Instana selective monitoring.

    Monitoring mode Description Agent environment Process environment Result
    Opt-out All processes monitored by default Not set orINSTANA_SELECTIVE_MONITORING=OPT_OUT Not set The process is monitored
    Opt-out All processes monitored by default Not set orINSTANA_SELECTIVE_MONITORING=OPT_OUT INSTANA_MONITORING=false The process is ignored
    Opt-in No process monitored by default INSTANA_SELECTIVE_MONITORING=OPT_IN Not set The process is ignored
    Opt-in No process monitored by default INSTANA_SELECTIVE_MONITORING=OPT_IN INSTANA_MONITORING=true The process is monitored
    • Kubernetes namespace label for namespace opt-out and opt-in

    Instana allows you to control workload monitoring at the Kubernetes namespace level by using the instana-workload-monitoring label. This label determines whether a namespace is included or excluded from monitoring:

    • instana-workload-monitoring=false → Opt-out: Monitoring is disabled for all workloads in the namespace.
    • instana-workload-monitoring=true → Opt-in: Monitoring is enabled for workloads in the namespace.

Manage data ingest from synthetics

To minimize data ingestion from synthetic tests in Instana, consider the following approaches:

  • Disable video recording in browser tests, which is the default setting
  • Limit screenshot capture, as screenshots are typically taken only when a test fails
  • Increase in test frequency by decreasing the number of executed tests by increase in interval size (known as frequency of scheduling (in minutes) in Instana)
  • Stop test execution when synthetic monitoring is not required

Manage data ingest from logging sources

Instana provides options to control data ingestion from logging sources:

  • OpenTelemetry logs

You can manage log volume by limiting the number of OpenTelemetry sources contributing to logging. This helps reduce unnecessary data ingest and improves overall efficiency.

  • Trace logs

Instana tracers provide the flexibility to disable logging instrumentation across supported technologies. The following configuration is applied at the tracer level for Java, Node.js, Go, PHP, and Ruby.

Manage data ingested from serverless monitoring in Instana

Instana currently offers limited support for serverless technologies, with primary focus on AWS Lambda functions. To optimize data ingestion and manage associated costs effectively, you can consider the following strategies:

  • Manage lambda function versions

By default, Instana monitors the five most recent versions of each Lambda function. You can reduce this number to lower monitoring costs. Instana supports monitoring of up to 20 versions, but fewer versions typically mean reduced data volume and cost. For more information, see Disabling retrieval of Lambda versions and metrics.

  • Adjusting polling intervals

The default polling interval for Lambda metrics is 5 minutes (300 seconds). Increasing this interval reduces the frequency of data collection, which can help in minimizing ingestion costs. For more information, see Changing poll rate.

Conclusion

For users seeking full control over instrumentation, Instana offers the flexibility to use OpenTelemetry instead of its native instrumentation. This enables more customizable collection and management of observability data. For more information, see OpenTelemetry.

The strategies help organizations optimize data ingest in Instana, improving efficiency while maintaining a strong focus on observability. This leads to a more streamlined and effective approach to managing observability data.