Troubleshooting

You might encounter the following issues with Generative AI Observability.

Connection errors

Issue: Application logs show connection errors to Instana.

Solution:

For agent mode:

  • Verify that the Instana agent is running.
  • Check agent logs for errors.
  • Ensure the agent host and port are correct.
  • Verify network connectivity between your application and the agent.

For agentless mode:

  • Verify the backend endpoint URL is correct.
  • Check that port 4317 is accessible.
  • Verify that firewall rules allow outbound connections.
  • Ensure x-instana-key is correct.

Check TLS settings:

  • For agent mode: Usually OTEL_EXPORTER_OTLP_INSECURE=true
  • For agentless mode: Usually OTEL_EXPORTER_OTLP_INSECURE=false
  • Verify these match your Instana setup

Cost metrics are not displayed

Issue: Traces and other metrics are displayed, but cost metrics are missing.

Solution:

  1. Verify that pricing is configured in the dashboard:

    • Go to GenAI observability > Pricing configuration.
    • Check that pricing is set for the models you're using.
    • Model IDs must match exactly (case-sensitive).
  2. Check model ID format:

    • Ensure that your application is reporting the correct model IDs.
    • Model IDs must match the format in the pricing configuration, for example, "gpt-4" not "GPT-4" or "gpt4".
  3. If you have set platform-specific pricing, check whether the platform is matching.

    • Make sure that the pricing is set for is applicable to either any platform or a specific platform.
  4. Wait for data to propagate:

    • After you configure pricing, it might take a few minutes for cost metrics to appear.
    • Generate some new requests to see updated metrics.
Note: Other metrics (latency, token counts, error rates) will be visible even without pricing configuration.

Traces and metrics are not displayed

Issue: The traces or metrics are not displayed in the GenAI observability dashboard.

Resolution:

  1. Check whether OTEL_RESOURCE_ATTRIBUTES="INSTANA_PLUGIN=genai" is set correctly.

    • This is the most common issue - without this attribute, Instana won't recognize your data as generative AI telemetry.
    • Check your environment variables or configuration files.
    • Restart your application after adding this variable.
  2. Check the Instana endpoint configuration

    • Check whether TRACELOOP_BASE_URL points to the correct Instana agent or backend.
    • For agent mode, make sure that the agent is running and accessible.
    • For agentless mode, verify that the backend endpoint URL is correct.
  3. Verify authentication

    • Check whether x-instana-key in TRACELOOP_HEADERS is correct.
    • Make sure that the key has permissions for your environment.
  4. Review application logs:

    • Look for OpenTelemetry connection errors.
    • Check for authentication failures.
    • Verify that there are no network connectivity issues.

Conflict with Instana Python sensor

Issue: When both Instana Python sensor and Traceloop are used together, you might encounter the following error.

AttributeError: 'SimpleSpanProcessor' object has no attribute 'record_span'

Possible cause: The Instana Python sensor and Traceloop use different instrumentation approaches that conflict with each other. Running both items simultaneously causes instrumentation conflicts.

For Generative AI applications, use Traceloop only and do not enable the Instana Python sensor.

To fix this issue, try the following steps:

  1. Remove or comment out the Instana import.
    # import instana # Remove this line
  2. Keep only the Traceloop initialization.
    from traceloop.sdk import Traceloop
    Traceloop.init(
        disable_batch=True
    )
  3. Restart your application.
Note: You must instrument Generative AI applications only with Traceloop. Do not enable Python tracing through the Instana sensor for these applications.

Python version is not compatible

Issue: When you install IBM watsonx packages (such as ibm-watsonx-ai or ibm-watson-machine-learning), you may encounter a compilation error with pandas:
error: too few arguments to function '_PyLong_AsByteArray'

Possible reason: Python 3.13 introduced breaking changes in the C API that are not yet compatible with pandas and related dependencies used by IBM watsonx packages. Some of these packages have an internal dependency on pandas, and this pandas version requires Python 3.11 or 3.12 to work correctly.

To troubleshoot this issue, try the following steps:

  • Install Python 3.11 or 3.12
    • For Ubuntu/Debian:
      sudo apt update
      sudo apt install software-properties-common -y
      sudo add-apt-repository ppa:deadsnakes/ppa -y
      sudo apt update
      sudo apt install python3.11 python3.11-venv python3.11-dev -y
    • For Fedora/RHEL/CentOS (using dnf):
      sudo dnf install python3.11 python3.11-devel -y
    • For older systems (using yum):
      sudo yum install epel-release -y
      sudo yum install https://repo.ius.io/ius-release-el7.rpm -y
      sudo yum install python311 python311-devel -y
    • Create virtual environment with Python 3.11. After installing Python 3.11 or 3.12, create a virtual environment.
      # Create virtual environment with Python 3.11
      python3.11 -m venv venv
      
      # Activate the virtual environment
      source venv/bin/activate
      
      # Upgrade pip
      pip install --upgrade pip
      
      # Install IBM Watson packages
      pip install ibm-watsonx-ai ibm-watson-machine-learning langchain-ibm traceloop-sdk
    • Verify Python version. To confirm you're using the correct Python version:
      python --version

      The Python version must be 3.11.x or 3.12.x.

OTel Data Collector is still in use

The OTel Data Collector for generative AI (ODCG) is deprecated. ODCG is no longer required. To remove ODCG, complete the following steps:

  1. Update your application configuration

    The key change is adding the INSTANA_PLUGIN=genai resource attribute and removing the separate metrics endpoint.

    • For agent mode (sending data through Instana agent)
      The old configuration is shown in the following example.
      export TRACELOOP_BASE_URL=<instana-agent-host>:4317
      export TRACELOOP_HEADERS="x-instana-key=<agent-key>,x-instana-host=<host>"
      export TRACELOOP_METRICS_ENDPOINT=<odcg-host>:8000
      export TRACELOOP_METRICS_ENABLED=trueexport TRACELOOP_LOGGING_ENABLED=true
      export OTEL_EXPORTER_OTLP_INSECURE=true
      The new configuration is shown in the following example.
      export OTEL_RESOURCE_ATTRIBUTES="INSTANA_PLUGIN=genai"
      export TRACELOOP_BASE_URL=<instana-agent-host>:4317
      export TRACELOOP_HEADERS="x-instana-key=<agent-key>,x-instana-host=<host>"
      export TRACELOOP_METRICS_ENABLED=trueexport TRACELOOP_LOGGING_ENABLED=true
      export OTEL_EXPORTER_OTLP_INSECURE=true
      The following items are changed.
      • OTEL_RESOURCE_ATTRIBUTES="INSTANA_PLUGIN=genai" is added to process your data as generative AI telemetry

      • TRACELOOP_METRICS_ENDPOINT is removed for the metrics to flow through the same endpoint as traces

    • For agentless mode (sending data directly to Instana backend)

      The old configuration is shown in the following example.
      export TRACELOOP_BASE_URL=<instana-otlp-endpoint>:4317 
      export TRACELOOP_HEADERS="x-instana-key=<agent-key>,x-instana-host=<host>" 
      export TRACELOOP_METRICS_ENDPOINT=<odcg-host>:8000 
      export TRACELOOP_METRICS_ENABLED=true export TRACELOOP_LOGGING_ENABLED=true 
      export OTEL_EXPORTER_OTLP_INSECURE=false
      The new configuration is shown in the following example.
      export OTEL_RESOURCE_ATTRIBUTES="INSTANA_PLUGIN=genai" 
      export TRACELOOP_BASE_URL=<instana-otlp-endpoint>:4317 
      export TRACELOOP_HEADERS="x-instana-key=<agent-key>,x-instana-host=<host>" 
      export TRACELOOP_METRICS_ENABLED=true export TRACELOOP_LOGGING_ENABLED=true 
      export OTEL_EXPORTER_OTLP_INSECURE=false
      The following items are changed.
      • OTEL_RESOURCE_ATTRIBUTES="INSTANA_PLUGIN=genai" is added to process your data as generative AI telemetry.

      • TRACELOOP_METRICS_ENDPOINT is removed for the metrics to flow through the same endpoint as traces

  2. Migrate your pricing configuration. Model pricing is managed through the Instana UI instead of a configuration file.

    1. Locate your current pricing configuration.

      If OTel Data Collector is used, a prices.properties file similar to the following example is displayed.

      openai.gpt-4.input=0.03 
      openai.gpt-4.output=0.06 
      openai.gpt-3.5-turbo.input=0.0015 
      openai.gpt-3.5-turbo.output=0.002 
      anthropic.claude-2.input=0.008 
      anthropic.claude-2.output=0.024
    2. Configure pricing on the Instana UI.

      1. Log on to the Instana UI.
      2. Navigate to the GenAI observability dashboard.
      3. Click the Configuration tab. A predefined list of common LLM models with default pricing is displayed.
      4. Update pricing for an existing model:
        1. Find the model in the list.
        2. Click on the model name and click Edit
        3. Update the input and output token prices.
        4. Click Save.
      5. Add a new model:
        1. Click Add model pricing.
        2. Enter the provider (for example, "openai", "anthropic").
        3. Enter the model ID (for example, "gpt-4", "claude-2").
        4. Enter platform to set platform specific pricing.
        5. Enter input and output token prices (for example, "bedrock", "langchain").
        6. Click Add.
        Figure 1. LLM-pricing-configuration

        Benefits of dashboard-based pricing:

        • Changes take effect immediately - no redeployment needed
        • Easy to update as pricing changes
        • Centralized management across all your generative AI applications
        • Audit trail of pricing changes
        Note:

        Cost metrics will appear in your dashboards only after you configure pricing. Other metrics (latency, token counts, error rates) are visible regardless of pricing configuration.

  3. After updating the configuration, restart your generative AI application to apply the changes.

    • For Kubernetes deployments:
      kubectl rollout restart deployment/<your-app-deployment> -n <your-namespace>
    • For Red Hat OpenShift deployments:
      oc rollout restart deployment/<your-app-deployment> -n <your-namespace>
    • For standalone applications:

      Verify that data is flowing correctly. After restarting your application, verify that Instana is receiving data:

      1. Check traces:
        1. Go to the GenAI observability dashboard in Instana.
        2. Verify that new traces are appearing for your application.
        3. Verify that traces show LLM calls, token counts, and latency information.
      2. Check metrics:
        1. In the GenAI observability dashboard, check the metrics view.

          Check whether metrics appear for Token usage (input and output), Latency, and Cost metrics (if pricing is configured).

      3. Check logs:

        • If logging is enabled, verify that logs are ingested.
        • Look for any error messages related to OpenTelemetry or Traceloop.
      4. Review application logs:

        • Check your application logs for any OpenTelemetry errors.
        • Look for successful connection messages to the Instana endpoint.
      5. Optional: Remove the OTel Data Collector.

After you verify that data is flowing correctly with the new configuration, you can safely remove the OTel Data Collector deployment.

If you encounter issues when you try to remove OTel Data Collector for generative AI (ODCG) , try the following steps:

Review your configuration:

  • Ensure that all environment variables are set correctly
  • Verify that the Instana endpoint is accessible
  • Check that pricing is configured in the dashboard

Contact support:

  • Provide your application logs
  • Include your configuration (with sensitive data redacted)
  • Describe what you've tried and the results

Instana is using the deprecated metrics endpoint

Issue: The OTel Data Collector for generative AI (ODCG) is deprecated. The application is still trying to connect to the OTel Data Collector.

Solution:

  1. Verify you removed TRACELOOP_METRICS_ENDPOINT from your configuration
  2. Check for environment variables set at different levels:
    • Container or pod level
    • Deployment/StatefulSet level
    • ConfigMap or Secret references
    • System-wide environment variables
  3. Restart your application after removing the variable
  4. Check application logs to confirm it's not trying to connect to port 8000