Monitoring large language models (LLMs)
With Instana, you can monitor LLM to gain insights into the application and model usage. You can identify abnormal trends in LLM responses (for example, increased latency or failure) and usage costs. After you install the Instana agent, you need to install OpenTelemetry (OTel) Data Collector and instrument the LLM applications to collect trace and metrics data from LLM. You can view this data in the Instana UI.
Installing OTel Data Collector for LLM (ODCL)
To collect OpenTelemetry metrics for various LLM and LLM applications, you need to install ODCL. All implementations are based on predefined OpenTelemetry Semantic Conventions.
For more information, see GitHub.
To collect OTel metrics from LLM, complete the following steps:
-
Verify that the following prerequisites are met:
-
Java SDK 11+ is installed. Run the following command to check the installed version:
java -version
-
OpenTelemetry data ingestion is enabled. For more information, see Sending OpenTelemetry data to the Instana agent.
-
-
Install the collector:
a. Download the installation package:
wget https://github.com/instana/otel-dc/releases/download/v1.0.0/otel-dc-llm-1.0.0.tar
b. Extract the package to the preferred deployment location:
tar xf otel-dc-llm-1.0.0.tar
-
Modify the configuration file:
a. Open the
config.yaml
file:cd otel-dc-llm-1.0.0 vi config/config.yaml
b. Update the following fields in the
config.yaml
file:otel.backend.url
: The OTel gRPC address of the Instana agent, for example:http://<instana-agent-host>:4317
.otel.service.name
: The Data Collector name, which can be any string that you choose.otel.service.port
: The listen port of Data Collector for receiving the metrics data from the instrumented applications, the default port is 8000.<ai-system>.price.prompt.tokens.per.kilo
: The unit price per thousand prompt tokens.<ai-system>.price.complete.tokens.per.kilo
: The unit price per thousand complete tokens.
c. Open the
logging.properties
file by using the following command:vi config/logging.properties
Configure the Java logging settings in the
logging.properties
file according to your needs. -
Run the Data Collector with the following command according to your current system:
nohup ./bin/otel-dc-llm >/dev/null 2>&1 &
You can also use tools like
tmux
orscreen
to run this program in the background.
Instrumenting LLM applications
Instrumentation is the process of injecting the code into LLM applications to get detailed information about the LLM API calls in the applications. For more information about supported LLMs, see Instrumenting LLM applications.
The instrumentation collects both trace and metric data. The trace data is sent to the Instana agent directly. The metrics data is sent to the LLM Data Collector first for aggregation, and then the collector sends it to the Instana agent.
To instrument the LLM application, complete the following steps:
-
Verify that python3.10+ is installed. Run the following command to check the installed version:
python3 -V
-
Optional: Create a virtual environment for your applications to keep your dependencies consistent and prevent conflicts with other applications.
To create a virtual environment, run the following command:
pip install virtualenv virtualenv traceloop source traceloop/bin/activate
-
Install Traceloop SDK and the dependencies.
a. To install the SDK, run the following command:
pip install traceloop-sdk==0.18.2
b. To install dependencies for the instrumentations that you need, run the following command:
-
To install dependencies for IBM watsonx, run the following command:
pip install ibm-watsonx-ai==1.0.5 ibm-watson-machine-learning==1.0.357 langchain-ibm==0.1.7
-
To install dependency for OpenAI, run the following command in your terminal:
pip install openai==1.31.0
-
To install dependency for Anthropic, run the following command in your terminal:
pip install anthropic==0.25.7
-
To install dependencies for LangChain, run the following command in your terminal:
pip install langchain==0.2.2 langchain-community==0.2.3 langchain-core==0.2.4 langchain-text-splitters==0.2.1
c. In your LLM app, initialize the Traceloop tracer by running the following command:
from traceloop.sdk import Traceloop Traceloop.init()
If you’re running this locally, you can disable the batch sending to see the traces immediately:
Traceloop.init(disable_batch=True)
-
-
Annotate your workflows.
If you have complex workflows or chains, you can annotate them to get a better understanding of what is going on. You can see the complete trace of your workflow on Traceloop or any other dashboard that you use. You can use decorators to make this easier. For example, if you have a function that renders a prompt and calls an LLM, add
@workflow
(or for asynchronous methods, use@aworkflow
).If you use an LLM framework like Haystack, Langchain, or LlamaIndex, you do not need to add any annotations to your code.
from traceloop.sdk.decorators import workflow @workflow(name="suggest_answers") def suggest_answers(question: str):
For more information, see annotations.
-
Optional: Verify the installation and configuration. You can use the following code to generate a sample application named sample-llm-app.py:
- The following is a sample application for IBM watsonx:
import os, types, time, random from ibm_watsonx_ai.metanames import GenTextParamsMetaNames from ibm_watsonx_ai.foundation_models import ModelInference from pprint import pprint from traceloop.sdk import Traceloop from traceloop.sdk.decorators import workflow from langchain_ibm import WatsonxLLM Traceloop.init(app_name="watsonx_llm_langchain_question") def watsonx_llm_init() -> ModelInference: watsonx_llm_parameters = { GenTextParamsMetaNames.DECODING_METHOD: "sample", GenTextParamsMetaNames.MAX_NEW_TOKENS: 100, GenTextParamsMetaNames.MIN_NEW_TOKENS: 1, GenTextParamsMetaNames.TEMPERATURE: 0.5, GenTextParamsMetaNames.TOP_K: 50, GenTextParamsMetaNames.TOP_P: 1, } models = ['ibm/granite-13b-chat-v2', 'ibm/granite-13b-instruct-v2'] model = random.choice(models) watsonx_llm = WatsonxLLM( model_id=model, url="https://us-south.ml.cloud.ibm.com", apikey=os.getenv("IAM_API_KEY"), project_id=os.getenv("PROJECT_ID"), params=watsonx_llm_parameters, ) return watsonx_llm @workflow(name="watsonx_llm_langchain_question") def watsonx_llm_generate(question): watsonx_llm = watsonx_llm_init() return watsonx_llm.invoke(question) for i in range(10): question_multiple_responses = [ "What is AIOps?", "What is GitOps?"] question = random.choice(question_multiple_responses) response = watsonx_llm_generate(question) if isinstance(response, types.GeneratorType): for chunk in response: print(chunk, end='') pprint(response) time.sleep(3)
- The following is a sample application for OpenAI:
import os, time, random from openai import OpenAI from traceloop.sdk import Traceloop from traceloop.sdk.decorators import workflow client = OpenAI(api_key=os.getenv("OPENAI_API_KEY")) Traceloop.init(app_name="openai_sample_service", disable_batch=True) @workflow(name="streaming_ask") def ask_workflow(): models = [ "gpt-3.5-turbo", "gpt-4-turbo-preview" ] mod = random.choice(models) questions = [ "What is AIOps?", "What is GitOps?" ] question = random.choice(questions) stream = client.chat.completions.create( model=mod, messages=[{"role": "user", "content": question}], stream=True, ) for part in stream: print(part.choices[0].delta.content or "", end="") for i in range(10): ask_workflow() time.sleep(3);
- The following is a sample application for IBM watsonx:
-
Configure the environment to export your traces and metrics and the credentials to access watsonx or OpenAI. For more information on Traceloop, see the following information. For more information on other options, see Exporting.
-
Environments required:
export TRACELOOP_BASE_URL=<instana-agent-host>:4317 export TRACELOOP_METRICS_ENABLED="true" export TRACELOOP_METRICS_ENDPOINT=<otel-dc-llm-host>:8000 export TRACELOOP_HEADERS="api-key=DUMMY_KEY" export OTEL_EXPORTER_OTLP_INSECURE=true export OTEL_METRIC_EXPORT_INTERVAL=10000
The hostname
<instana-agent-host>
is the host on which the Instana agent is installed.The hostname
<otel-dc-llm-host>
is the host on which the ODCL is installed. -
The following credentials are required to access OpenAI, IBM watsonx, and Anthropic:
-
Only for OpenAI instrumentation:
export OPENAI_API_KEY=<openai-api-key>
To create an API key to access the OpenAI API or use the existing one, see OpenAI.
-
Only for IBM watsonx instrumentation:
export IAM_API_KEY=<watsonx-iam-api-key> export PROJECT_ID=<watsonx-project-id>
To create the Project ID and IAM API key to access IBM watsonx or to use the existing one, see IBM watsonx and IBM Cloud.
-
Only for Anthropic instrumentation:
export ANTHROPIC_API_KEY=<anthropic-api-key>
To create an API key to access the Anthropic API or use the existing one, see Anthropic.
-
-
-
Optional: Run the sample application to verify installation and configuration.
If you directly use the
sample-llm-app.py
sample application, run the following command in your terminal to verify the installation and configuration:python3 ./sample-llm-app.py
-
Run the LLM application:
If you instrumented your application, run the following command in your terminal to monitor your LLM application:
python3 ./<instrumented-llm-app>.py
<instrumented-llm-app>
is the application name of your instrumented LLM application.
Viewing metrics
After you install OpenTelemetry (OTel) Data Collector and instrument the LLM applications, you can view the metrics in the Instana UI.
- Open the Instana UI, and click Infrastructure. Then, click Analyze Infrastructure.
- Select OTEL LLMonitor from the list of types of the entities.
- Click the entity instance of OTEL LLMonitor entity type to open the associated dashboard.
You can view the following LLM observability metrics:
- Total Tokens
- Total Cost
- Total Request
- Average Duration