Monitoring large language models (LLMs)

Use Instana to monitor LLMs and gain detailed insights into application and model usage. You can identify abnormal trends in LLM responses, such as increased latency or failure, and track usage costs. After you install the Instana agent, install the OpenTelemetry (OTel) Data Collector and instrument the LLM applications to collect trace, metrics, and logs data from LLM. You can view this data in the Instana UI.

Installing OTel Data Collector for LLM (ODCL)

To collect OpenTelemetry metrics for various LLM and LLM applications, you need to install ODCL. All implementations are based on predefined OpenTelemetry Semantic Conventions.

The following AI systems are supported:

  • IBM watsonx
  • OpenAI
  • Anthropic
  • Amazon Bedrock

For more information, see GitHub.

To collect OTel metrics from LLM, complete the following steps:

  1. Verify that the following prerequisites are met:

    • Java SDK 11+ is installed. Run the following command to check the installed version:

      java -version
      
    • OpenTelemetry data ingestion is enabled. For more information, see Sending OpenTelemetry data to Instana.

  2. Install the collector:

    a. Download the installation package:

    wget https://github.com/instana/otel-dc/releases/download/v1.0.5/otel-dc-llm-1.0.5.tar
    

    b. Extract the package to the preferred deployment location:

    tar xf otel-dc-llm-1.0.5.tar
    
  3. Modify the configuration file:

    a. Open the config.yaml file:

    cd otel-dc-llm-1.0.5
    vi config/config.yaml
    

    b. Update the following fields in the config.yaml file:

    • otel.agentless.mode: The connection mode of the OTel data connector, the default mode is agentless.
    • otel.backend.url: The gRPC endpoint of the Instana backend or Instana agent, that depends on agentless or not.
    • callback.interval: The time interval in seconds to post data to backend or agent.
    • otel.service.name: The Data Collector name, which can be any string that you choose.
    • otel.service.port: The listen port of Data Collector for receiving the metrics data from the instrumented applications, the default port is 8000.

    If otel.agentless.mode is set to true, metrics data is sent directly to the Instana backend. The otel.backend.url is the gRPC endpoint of the backend otlp-acceptor component with http/https scheme, for example http://<instana-otlp-endpoint>:4317. Use https:// scheme if the endpoint of the Instana backend otlp-acceptor in SaaS environments is TLS-enabled. For more information about Instana SaaS environment, see Endpoints of the Instana backend otlp-acceptor.

    If otel.agentless.mode is set to false, metrics data is sent to the Instana agent. The otel.backend.url is the gRPC endpoint of the Instana agent, for example http://<instana-agent-host>:4317.

    c. Open the prices.properties file:

    vi config/prices.properties
    

    d. Customize pricing for models in use by using the following format in the prices.properties file:

    <aiSystem>.<modelId>.input=0.0
    <aiSystem>.<modelId>.output=0.0
    
    • The <aiSystem> is the LLM provider in use.
    • The <modelId> can be set to * to match all modelIds within the <aiSystem>, but this has a lower priority than items with a specific modelId specified.

    e. Open the logging.properties file by using the following command:

    vi config/logging.properties
    

    Configure the Java logging settings in the logging.properties file according to your needs.

  4. Run the Data Collector with the following command according to your current system:

    nohup ./bin/otel-dc-llm >/dev/null 2>&1 &
    

    You can also use tools like tmux or screen to run this program in the background.

Instrumenting LLM applications

Instrumentation is the process of injecting the code into LLM applications to get detailed information about the LLM API calls in the applications. For more information about supported LLMs, see Instrumenting LLM applications.

The instrumentation collects both trace and metric data. The trace data is sent to the Instana agent directly. The metrics data is sent to the LLM Data Collector first for aggregation, and then the collector sends it to the Instana agent.

To instrument the LLM application, complete the following steps:

  1. Verify that python3.10+ is installed. Run the following command to check the installed version:

    python3 -V
    
  2. Optional: Create a virtual environment for your applications to keep your dependencies consistent and prevent conflicts with other applications.

    To create a virtual environment, run the following command:

    pip3 install virtualenv
    virtualenv traceloop
    source traceloop/bin/activate
    
  3. Install Traceloop SDK and the dependencies.

    a. To install the SDK, run the following command:

    pip3 install traceloop-sdk==0.33.5
    

    b. To install dependencies for the instrumentations that you need, run the following command:

    • To install dependencies for IBM watsonx, run the following command:

      pip3 install ibm-watsonx-ai==1.1.20 ibm-watson-machine-learning==1.0.366 langchain-ibm==0.3.1
      
    • To install dependency for OpenAI, run the following command in your terminal:

      pip3 install openai==1.52.2
      
    • To install dependency for Anthropic, run the following command in your terminal:

      pip3 install anthropic==0.37.1
      
    • To install dependencies for Amazon Bedrock, run the following command in your terminal:

      pip3 install boto3==1.35.49
      
    • To install dependencies for LangChain, run the following command in your terminal:

      pip3 install langchain==0.3.4 langchain-community==0.3.3 langchain-core==0.3.13 langchain-text-splitters==0.3.0
      

    c. In your LLM app, initialize the Traceloop tracer by running the following command:

    from traceloop.sdk import Traceloop
    Traceloop.init()
    

    If you’re running this locally, you can disable the batch sending to see the traces immediately:

    Traceloop.init(disable_batch=True)
    
  4. Annotate your workflows.

    If you have complex workflows or chains, you can annotate them to get a better understanding of what is going on. You can see the complete trace of your workflow on Traceloop or any other dashboard that you use. You can use decorators to make this easier. For example, if you have a function that renders a prompt and calls an LLM, add @workflow (or for asynchronous methods, use @aworkflow).

    If you use an LLM framework like Haystack, Langchain, or LlamaIndex, you do not need to add any annotations to your code.

    from traceloop.sdk.decorators import workflow
    @workflow(name="suggest_answers")
    def suggest_answers(question: str):
    

    For more information, see annotations.

  5. Optional: Verify the installation and configuration. You can use the following code to generate a sample application named sample-llm-app.py:

    • The following is a sample application for IBM watsonx:
      import os, types, time, random
      from ibm_watsonx_ai.metanames import GenTextParamsMetaNames
      from ibm_watsonx_ai.foundation_models import ModelInference
      from pprint import pprint
      from traceloop.sdk import Traceloop
      from traceloop.sdk.decorators import workflow
      from langchain_ibm import WatsonxLLM
      
      Traceloop.init(app_name="watsonx_chat_service")
      
      def watsonx_llm_init() -> ModelInference:
          watsonx_llm_parameters = {
              GenTextParamsMetaNames.DECODING_METHOD: "sample",
              GenTextParamsMetaNames.MAX_NEW_TOKENS: 100,
              GenTextParamsMetaNames.MIN_NEW_TOKENS: 1,
              GenTextParamsMetaNames.TEMPERATURE: 0.5,
              GenTextParamsMetaNames.TOP_K: 50,
              GenTextParamsMetaNames.TOP_P: 1,
          }
          models = ['ibm/granite-13b-chat-v2', 'ibm/granite-13b-instruct-v2']
          model = random.choice(models)
          watsonx_llm = WatsonxLLM(
              model_id=model,
              url="https://us-south.ml.cloud.ibm.com",
              apikey=os.getenv("IAM_API_KEY"),
              project_id=os.getenv("PROJECT_ID"),
              params=watsonx_llm_parameters,
          )
          return watsonx_llm
      
      @workflow(name="watsonx_llm_langchain_question")
      def watsonx_llm_generate(question):
          watsonx_llm = watsonx_llm_init()
          return watsonx_llm.invoke(question)
      
      for i in range(10):
          question_multiple_responses = [ "What is AIOps?", "What is GitOps?"]
          question = random.choice(question_multiple_responses)
          response = watsonx_llm_generate(question)
          if isinstance(response, types.GeneratorType):
              for chunk in response:
                  print(chunk, end='')
          pprint(response)
          time.sleep(3)
      
    • The following is a sample application for OpenAI:
      import os, time, random
      from openai import OpenAI
      from traceloop.sdk import Traceloop
      from traceloop.sdk.decorators import workflow
      
      client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
      
      Traceloop.init(app_name="openai_chat_service", disable_batch=True)
      
      @workflow(name="streaming_ask")
      def ask_workflow():
          models = [ "gpt-3.5-turbo", "gpt-4-turbo-preview" ]
          mod = random.choice(models)
          questions = [ "What is AIOps?", "What is GitOps?" ]
          question = random.choice(questions)
          stream = client.chat.completions.create(
              model=mod,
              messages=[{"role": "user", "content": question}],
              stream=True,
          )
          for part in stream:
              print(part.choices[0].delta.content or "", end="")
      
      for i in range(10):
          ask_workflow()
          time.sleep(3);
      
    • The following is a sample application for AWS Bedrock:
      import json, logging
      import boto3
      from traceloop.sdk import Traceloop
      from traceloop.sdk.decorators import task, workflow
      
      Traceloop.init(app_name="bedrock_chat_service")
      bedrock_runtime = boto3.client(service_name="bedrock-runtime", region_name="us-east-1")
      
      logger = logging.getLogger(__name__)
      @task(name="joke_creation")
      def create_joke():
          logger.warning("bedrock to create_joke")
          express_prompt = "Tell me a joke"
          body = json.dumps({
              "inputText": express_prompt,
              "textGenerationConfig":{
                  "maxTokenCount":128,
                  "stopSequences":[],
                  "temperature":0,
                  "topP":0.9
              }
          })
          response = bedrock_runtime.invoke_model(
              body=body,
              modelId="amazon.titan-text-express-v1",
              accept="application/json",
              contentType="application/json"
          )
          response_body = json.loads(response.get('body').read())
          outputText = response_body.get('results')[0].get('outputText')
          text = outputText[outputText.index('\n')+1:]
          about_lambda = text.strip()
          return about_lambda
      
      @workflow(name="joke_generator")
      def joke_workflow():
          print(create_joke())
      
      logger.warning("Call joke_workflow ...")
      joke_workflow()
      
  6. Configure your environment to export traces, metrics, and credentials to access the following AI systems. For more information on Traceloop, see the following information. For more information on other export options, see Exporting.

    • Environments required:

      • If sending OpenTelemetry traces and logs data to the Instana backend:

        export TRACELOOP_BASE_URL=<instana-otlp-endpoint>:4317
        export TRACELOOP_HEADERS="x-instana-key=<agent-key>,x-instana-host=<instana-host>"
        export TRACELOOP_LOGGING_ENABLED=true
        export TRACELOOP_LOGGING_ENDPOINT=$TRACELOOP_BASE_URL
        export TRACELOOP_METRICS_ENABLED=true
        export TRACELOOP_METRICS_ENDPOINT=<otel-dc-llm-host>:8000
        export OTEL_EXPORTER_OTLP_METRICS_INSECURE=true
        export OTEL_METRIC_EXPORT_INTERVAL=10000
        
        • The <instana-otlp-endpoint> is the endpoint of the backend otlp-acceptor component without http/https scheme.
        • The x-instana-key is the agent key of the Instana agent for targeting the Instana backend.
        • The x-instana-host is the host ID if no host.id, faas.id, or device.id resource attribute is defined in your application or system.
        • The <otel-dc-llm-host> is the host on which the ODCL is installed.

        If the endpoint of the Instana backend otlp-acceptor is in SaaS environments, that is TLS-enabled. For more information about Instana SaaS environment, see Endpoints of the Instana backend otlp-acceptor.

        export OTEL_EXPORTER_OTLP_INSECURE=false
        

        If the endpoint of the Instana backend otlp-acceptor is in OnPrem environments, that is not TLS-enabled.

        export OTEL_EXPORTER_OTLP_INSECURE=true
        
      • If sending OpenTelemetry traces and logs data to the Instana agent:

        export TRACELOOP_BASE_URL=<instana-agent-host>:4317
        export TRACELOOP_HEADERS="api-key=DUMMY_KEY"
        export TRACELOOP_LOGGING_ENABLED=true
        export TRACELOOP_LOGGING_ENDPOINT=$TRACELOOP_BASE_URL
        export TRACELOOP_METRICS_ENABLED=true
        export TRACELOOP_METRICS_ENDPOINT=<otel-dc-llm-host>:8000
        export OTEL_EXPORTER_OTLP_METRICS_INSECURE=true
        export OTEL_METRIC_EXPORT_INTERVAL=10000
        export OTEL_EXPORTER_OTLP_INSECURE=true
        
        • The <instana-agent-host> is the host on which the Instana agent is installed.
        • The <otel-dc-llm-host> is the host on which the ODCL is installed.
    • The following credentials are required to access OpenAI, IBM watsonx, Anthropic and Amazon Bedrock:

      • Only for OpenAI instrumentation:

        export OPENAI_API_KEY=<openai-api-key>
        

        To create an API key to access the OpenAI API or use the existing one, see OpenAI.

      • Only for IBM watsonx instrumentation:

        export IAM_API_KEY=<watsonx-iam-api-key>
        export PROJECT_ID=<watsonx-project-id>
        

        To create the Project ID and IAM API key to access IBM watsonx or to use the existing one, see IBM watsonx and IBM Cloud.

      • Only for Anthropic instrumentation:

        export ANTHROPIC_API_KEY=<anthropic-api-key>
        

        To create an API key to access the Anthropic API or use the existing one, see Anthropic.

      • Only for Amazon Bedrock instrumentation:

        export AWS_ACCESS_KEY_ID=<access-key-id>
        export AWS_SECRET_ACCESS_KEY=<access-key>
        

        To create an API key to access the Amazon Bedrock API or use the existing one, see Amazon IAM.

  7. Optional: Run the sample application to verify installation and configuration.

    If you directly use the sample-llm-app.py sample application, run the following command in your terminal to verify the installation and configuration:

    python3 ./sample-llm-app.py
    
  8. Run the LLM application:

    If you instrumented your application, run the following command in your terminal to monitor your LLM application:

    python3 ./<instrumented-llm-app>.py
    

    <instrumented-llm-app> is the application name of your instrumented LLM application.

View metrics

After you install OpenTelemetry (OTel) Data Collector and instrument the LLM applications, you can view the metrics in the Instana UI.

  1. Open the Instana UI, and click Infrastructure. Then, click Analyze Infrastructure.
  2. Select OTEL LLMonitor from the list of types of the entities.
  3. Click the entity instance of OTEL LLMonitor entity type to open the associated dashboard.

You can view the following LLM observability metrics:

  • Total tokens
  • Total cost
  • Total request
  • Average duration

View traces

To create an application perspective to view trace information that is gathered from the LLM application runtime, complete the following steps:

  1. Open the New Application Perspective wizard in one of the following ways:
    • On the Instana dashboard, go to the Applications section and click Add application.
    • From the navigation menu, click Applications to open the Applications dashboard. Then, click Add and select New Application Perspective.
  2. Select Services or Endpoints and click Next.
  3. Click Add filter and select a service name. You can select multiple services and endpoints by using OR conditions. The service name is specified by the app_name parameter in Traceloop.init(). For example, watsonx_chat_service.
  4. In the Application Perspective Name field, enter a name for the LLM application perspective. Then, click Create.

The steps to create a new application perspective are complete.

To view trace information, go to the navigation menu and click Analytics to open the Analytics dashboard. Here, you can analyze calls by application, service, and endpoint, by breaking down the data that Instana presents by service, endpoint, and call names. You can filter and group traces or calls by using arbitrary tags, such as filtered by 'Trace->Service Name' equals watsonx_chat_service. You can view log information along with the trace information. For more information, see Analyzing traces and calls.

View logs

To view logging information, complete the following steps:

  1. From the Instana UI navigation menu, click Analytics.
  2. From the drop-down list, select Logs.
  3. To add the tag and its value as a filter, click Add filter. For example, 'Logs->Custom Tags->otelServiceName' equals watsonx_chat_service.

Troubleshooting

SSL issues

In the case of SSL handshake issues (or similar ones) with your LLM applications, such as the following error:

Handshake failed with fatal error SSL_ERROR_SSL: error:100000f7:SSL routines:OPENSSL_internal:WRONG_VERSION_NUMBER.

You need to set OTEL_EXPORTER_OTLP_INSECURE=true for your LLM applications to export the data to the gRPC endpoint without using TLS.

Module issues

In the case of module 'lib' issue with your LLM applications, such as the following error:

AttributeError: module 'lib' has no attribute 'X509_V_FLAG_NOTIFY_POLICY'.

You need to install the following OpenSSL packages:

pip3 install --upgrade cryptography pyopenssl

No OpenTelemetry logs

To resolve issues with missing OpenTelemetry logs, make sure that the following settings are configured:

  • The feature flag feature.logging.enabled is set to true.
  • The feature flag feature.logging.opentelemetry.integration.enabled is set to * or some specific tenant-unit combination, such as <tenant1>-<unit1>,<tenant2>-<unit2>.
  • Restart the otlp-acceptor component for the feature flag to take effect.