Monitoring LLMs

LLM metrics

To view LLM metrics, click GenAI observability > LLM metrics.

LLM metrics with their cost, latency, and other performance details are displayed in the LLM metrics tab.


Parameter	Description
Input cost	The cost that is incurred for sending input tokens, that is, the prompt or query to the LLMs.
Output cost	The cost that is incurred for the tokens generated by the LLMs in response to the input.
Total cost	The total cost for the tokens generated by the LLMs in response to the input. The cost is the sum of Input cost and Output cost.
Models	The number of LLM models used during the observed period. This value indicates diversity in the language models employed.
GenAI services	The number of generative AI service providers or endpoints used. Each model might be hosted by a different service, reflecting potential variations in performance and pricing.
LLM calls (total)	The number of times that the LLMs were invoked, that is, how many prompt-response interactions occurred. This metric shows the overall activity level of the models.
Token usage	The token usage over time. You can filter the usage by LLM model or service to understand individual or collective consumption patterns.
Cost	The cost trends over time for the LLM models. You can filter the usage by LLM model or service to analyze cost efficiency.
LLM calls	The number of LLM calls over time. You can view the frequency of model invocations by LLM model or service to identify peak usage periods.
LLM latency	The latency of LLM calls over time. This metric measures response time for each model, indicating how quickly the models generate outputs.
Model name	The LLM model used.
Input tokens	The number of tokens in the prompts or queries that are sent to the LLM.
Output tokens	The number of tokens generated by the LLM as responses.
Total tokens	The sum of input tokens and output tokens, representing the overall token count for each interaction.
Total cost	The total expenditure for tokens generated by the LLMs.
Latency (mean)	The average response time of the LLM across all calls.
Number of calls	The total count of times that the LLM was called or invoked.

LLM traces

Edit online

To view LLM traces, click Gen AI observability > Traces.

LLM traces, including their input prompts and output responses, are displayed in the Traces tab.


Parameter	Description
Trace ID	A unique identifier that is assigned to each trace or interaction with the generative AI system. It is useful for tracking and debugging specific requests.
Gen AI service	The generative AI service used to process the prompt.
Input prompt	The question or command that is submitted to the AI model.
Output response	The answer generated by the AI model in response to the input prompt.
Total tokens	The total number of tokens used in the interaction, including both input and output.
Duration	The time taken by the AI model to generate the response, which is measured in milliseconds. This value helps assess performance and latency.
Status	Indicates whether the interaction was successful. A green checkmark indicates that the response was successfully generated without errors.

LLM Task view in Trace details

Edit online

When you view a trace that contains LLM calls, you can access detailed information about the LLM interactions through the LLM Task view. This view provides comprehensive insights into the generative AI operations within your application traces.

Accessing the LLM Task view

To access the LLM Task view:

Navigate to GenAI observability > Traces.
Click a trace that contains LLM calls.
In the trace details view, locate and click an LLM call span. The LLM task view displays detailed information about the selected LLM interaction.
Note: A newly ingested trace appears on the Traces page, but the LLM task view can take up to a minute afterward to generate and populate the data.

LLM Task view information

The LLM Task view provides the following information:

Table 1. LLM Task view parameters
Parameter	Description
Model name	The specific LLM model used for the interaction, for example, `gpt-4` or `claude-3-opus`.
Input prompt	The complete prompt or query sent to the LLM, including any system messages or context.
Output response	The full response generated by the LLM.
Input tokens	The number of tokens in the input prompt.
Output tokens	The number of tokens in the generated response.
Total tokens	The sum of input and output tokens.
Input cost	The cost incurred for processing the input tokens.
Output cost	The cost incurred for generating the output tokens.
Total cost	The total cost for the LLM interaction, which is the sum of input and output costs.
Duration	The time taken by the LLM to process the request and generate the response.
Temperature	The sampling temperature used for the LLM call, which controls randomness in the output.
Max tokens	The maximum number of tokens allowed in the response.
Top P	The nucleus sampling parameter used for controlling output diversity.
Frequency penalty	The penalty applied to reduce repetition of tokens in the output.
Presence penalty	The penalty applied to encourage the model to talk about new topics.

Benefits of the LLM Task view

The LLM Task view helps you:

Debug LLM interactions by examining the exact prompts and responses to identify issues or unexpected behavior.
Optimize costs by analyzing token usage and costs for individual LLM calls to identify optimization opportunities.
Monitor performance by tracking latency and response times for LLM operations within your application traces.
Understand model behavior by reviewing the parameters used for each LLM call to understand how they affect the output.
See LLM interactions in the context of the complete application trace, including upstream and downstream calls.