Monitoring LLMs

You can monitor LLM metrics and traces on the Instana UI.

LLM metrics

To view LLM metrics, click GenAI observability > LLM metrics.

LLM metrics with their cost, latency, and other performance details are displayed in the LLM metrics tab.

Parameter Description
Input cost The cost that is incurred for sending input tokens, that is, the prompt or query to the LLMs.
Output cost The cost that is incurred for the tokens generated by the LLMs in response to the input.
Total cost The total cost for the tokens generated by the LLMs in response to the input. The cost is the sum of Input cost and Output cost.
Models The number of LLM models used during the observed period. This value indicates diversity in the language models employed.
GenAI services The number of generative AI service providers or endpoints used. Each model might be hosted by a different service, reflecting potential variations in performance and pricing.
LLM calls (total) The number of times that the LLMs were invoked, that is, how many prompt-response interactions occurred. This metric shows the overall activity level of the models.
Token usage The token usage over time. You can filter the usage by LLM model or service to understand individual or collective consumption patterns.
Cost The cost trends over time for the LLM models. You can filter the usage by LLM model or service to analyze cost efficiency.
LLM calls The number of LLM calls over time. You can view the frequency of model invocations by LLM model or service to identify peak usage periods.
LLM latency The latency of LLM calls over time. This metric measures response time for each model, indicating how quickly the models generate outputs.
Model name The LLM model used.
Input tokens The number of tokens in the prompts or queries that are sent to the LLM.
Output tokens The number of tokens generated by the LLM as responses.
Total tokens The sum of input tokens and output tokens, representing the overall token count for each interaction.
Total cost The total expenditure for tokens generated by the LLMs.
Latency (mean) The average response time of the LLM across all calls.
Number of calls The total count of times that the LLM was called or invoked.

LLM traces

To view LLM traces, click Gen AI observability > Traces.

LLM traces, including their input prompts and output responses, are displayed in the Traces tab.

Parameter Description
Trace ID A unique identifier that is assigned to each trace or interaction with the generative AI system. It is useful for tracking and debugging specific requests.
Gen AI service The generative AI service used to process the prompt.
Input prompt The question or command that is submitted to the AI model.
Output response The answer generated by the AI model in response to the input prompt.
Total tokens The total number of tokens used in the interaction, including both input and output.
Duration The time taken by the AI model to generate the response, which is measured in milliseconds. This value helps assess performance and latency.
Status Indicates whether the interaction was successful. A green checkmark indicates that the response was successfully generated without errors.

LLM Task view in Trace details

When you view a trace that contains LLM calls, you can access detailed information about the LLM interactions through the LLM Task view. This view provides comprehensive insights into the generative AI operations within your application traces.

Accessing the LLM Task view

To access the LLM Task view:

  1. Navigate to GenAI observability > Traces.
  2. Click a trace that contains LLM calls.
  3. In the trace details view, locate and click an LLM call span. The LLM task view displays detailed information about the selected LLM interaction.
    Note: A newly ingested trace appears on the Traces page, but the LLM task view can take up to a minute afterward to generate and populate the data.

LLM Task view information

The LLM Task view provides the following information:

Table 1. LLM Task view parameters
Parameter Description
Model name The specific LLM model used for the interaction, for example, gpt-4 or claude-3-opus.
Input prompt The complete prompt or query sent to the LLM, including any system messages or context.
Output response The full response generated by the LLM.
Input tokens The number of tokens in the input prompt.
Output tokens The number of tokens in the generated response.
Total tokens The sum of input and output tokens.
Input cost The cost incurred for processing the input tokens.
Output cost The cost incurred for generating the output tokens.
Total cost The total cost for the LLM interaction, which is the sum of input and output costs.
Duration The time taken by the LLM to process the request and generate the response.
Temperature The sampling temperature used for the LLM call, which controls randomness in the output.
Max tokens The maximum number of tokens allowed in the response.
Top P The nucleus sampling parameter used for controlling output diversity.
Frequency penalty The penalty applied to reduce repetition of tokens in the output.
Presence penalty The penalty applied to encourage the model to talk about new topics.
Figure 1. LLM task view in Instana
LLM task view

Benefits of the LLM Task view

The LLM Task view helps you:

  • Debug LLM interactions by examining the exact prompts and responses to identify issues or unexpected behavior.
  • Optimize costs by analyzing token usage and costs for individual LLM calls to identify optimization opportunities.
  • Monitor performance by tracking latency and response times for LLM operations within your application traces.
  • Understand model behavior by reviewing the parameters used for each LLM call to understand how they affect the output.
  • See LLM interactions in the context of the complete application trace, including upstream and downstream calls.