The emergence of generative artificial intelligence (GenAI), powered by large language models (LLMs) has accelerated the widespread adoption of artificial intelligence. GenAI is proving to be very effective in tackling a variety of complex use cases with AI systems operating at levels that are comparable to humans. Organisations are quickly realizing the value of AI and its transformative potential for business, adding trillions of dollars to the economy. Given this emerging landscape, IBM Instana Observability is on a mission to enable the observability of GenAI-infused IT applications.

We’re excited to announce that it is now possible to observe GenAI-infused IT applications and platforms such as IBM watsonx.ai, Amazon Bedrock, HuggingFace, and more, all from within IBM Instana Observability. We have created a new sensor for GenAI Runtimes in Instana that enables end-to-end tracing of requests. The Sensor for GenAI Runtimes leverages OpenTelemetry features and Traceloop’s OpenLLMetry to collect traces, metrics and logs across the GenAI stack of technologies. GenAI stack of technologies is shown in Figure 1.

The Sensor for GenAI Runtimes enhances Instana’s existing capabilities in observing GPU-based infrastructure, automatic instrumentation, and incident detection and remediation automation. This integration provides comprehensive observability across AI-powered environments, so businesses can leverage generative AI technologies while maintaining high levels of performance and reliability.

Beyond the technical details, the telemetry and traces collected empower a broader suite of IBM Automation products: visualize business KPIs for GenAI applications with Apptio, an IBM company to optimize GPU resources with Turbonomic, and more.

This comprehensive approach goes beyond mere monitoring. It unlocks the true efficiency and effectiveness of your GenAI-powered systems, ensuring you maximize their impact.

Figure 1 List of GenAI features supported by newly announced IBM Instana Sensor for GenAI Runtimes using Open Telemetry and Traceloop openllmetry

Instana’s existing OpenTelemetry support accelerated the development of this exciting new capability. Instana is both a user of and contributor to OpenTelemetry:

  • Instana already supports OpenTelemetry and can collect metrics, traces, and logs using OpenTelemetry. Instana leverages open-source OpenLLMetry instrumentation code to enhance the observability of GenAI-infused applications.
  • This functionality is available immediately to all Instana customers.
  • A draft proposal has been submitted to add  OpenTelemetry semantic conventions for AI Observability: Add LLM semantic conventions. IBM Instana can leverage that semantic convention for data translation between OpenTelemetry and Instana. For a detailed example of this integration please see https://github.com/traceloop/openllmetry/tree/main/packages.

Advantages of using Instana for Observing GenAI

  • Custom application layer: This user-centric layer comprises applications that leverage AI models. Instana supports a wide range of automatic discovery and instrumentation capabilities enabling end-to-end tracing and observability of business and user workflows.
  • Orchestration framework: Tools such as LangChain, LiteLLM facilitate the seamless integration of various AI application elements, encompassing data manipulation, model invocation, and post processing.
  • Model layer: This layer contains advanced AI models that drive predictive analytics and output generation. Notable models in content generation include IBM Granite, GPT-4, GPT-3, Anthropic, Cohere, LLama 2 etc.
  • Data storage and vector databases: Essential for AI applications, these data storage solutions handle the retention and retrieval of vast datasets. Vector databases are tailor-made for high-dimensional data management, a common requirement in AI applications.
  • Infrastructure layer: This layer serves as the foundational platform for AI development and implementation, including GPUs and CPUs for training and deploying AI models. Additionally, cloud computing services like AWS, Azure, and GCP offer scalable environments for rolling out AI applications.

Examples of how an SRE would utilize the provided observability data during a troubleshooting process.

  • Performance monitoring: By observing metrics such as response times, throughput, and error rates, SREs can ensure that the LLM is performing as expected. Figure 2 below shows the LLM performance monitoring dashboard. This is crucial for maintaining the reliability and efficiency of services that depend on these models. Instana showcases these metrics in model dashboards.
Figure 2 Model Dashboard
  • Resource utilization: AI observability can provide insights into how well resources are being utilized by the LLM. SREs can monitor GPU, CPU, memory, and disk usage to optimize resource allocation and prevent bottlenecks. Instana showcases these metrics in infrastructure dashboards.
Figure 3 Infrastructure dashboard showing GPU metrics
  • Error and anomaly detection: By monitoring LLM traces, SREs can quickly detect and respond to errors or anomalies in the LLM’s operation. This includes identifying unusual patterns in model responses or technical issues within the underlying infrastructure. SREs can use Instana’s built-in AI-based smart alerts features to flag anomalies or define their own patterns.
  • Scaling and load balancing: Observability data can inform SREs about the current load on the system and predict future demands. This helps when making decisions about scaling up or down and effectively distributing the load to maintain optimal performance. SREs can leverage Instana’s integration with IBM Turbonomic to automatically optimize application and system performance.
  • Incident response and recovery: In the event of an incident, detailed traces are invaluable for root cause identification. SREs can use this information to quickly diagnose and rectify issues, minimizing downtime and improving system resilience. SREs can leverage intelligent remediation, currently in beta, to automate root cause identification and remediation of application impacting problems.
  • Feedback loop for model improvement: Monitoring the performance and outputs of LLMs can provide valuable feedback for data scientists and engineers working on the model. Observability data can highlight areas where the model may need refinement or retraining.
  • Cost management: By monitoring how the LLM utilizes cloud or computational resources, SREs can identify cost-saving opportunities, such as optimizing queries or streamlining the model deployment architecture.

Architecture Overview for Instana Observability with OpenTelemetry

Figure 4 Architecture Overview

Figure 4 illustrates how Instana uses the OpenTelemetry Instrumentation as a Data Source, sending all of the data to Instana Agent as a result. The AI Sensor running on the Instana Agent Node will use AI Semantic Convention to convert the payload from OpenTelemetry to Instana and finally display all the AI Observability data via the Instana Dashboard.

Testing Instana AI Observability with LangChain

Config Instana

To configure, simply follow the Instana Traceloop Integration steps, and enable a GRPC connection for the Instana Agent.

Running Test Code

You can get all the test codes for different LLM providers from here.

For example, here’s an environment where we ran langchain_app.py

Check Tracing and Metrics from the Instana Dashboard

After the test program for LangChain has finished, navigate to the Instana service dashboard via Applications -> All Services -> Services this is where you will see the service that was just created via LangChain (shown in Figure 5).

Figure 5 Check tracing and metrics from the Instana Dashboard

If you click the service name langchain_example, you will be navigated to the detailed tracing and metrics page for this LLM application.

As shown in Figure 6 below, the Sensor for GenAI Runtimes allows for end-to-end tracing by supporting a wide spectrum of GenAI services such as watsonx.ai, vector databases such as Pinecone, and GenAI frameworks such as LangChain. Such end-to-end tracing unlocks in-depth insights, including model parameters, performance metrics, and business KPIs. It is particularly effective for identifying and debugging issues such as increased latency, failures, or unusual usage costs by tracking the context of application and model usage.

Figure 6 End-to-end tracing of requests across LangChain
  • You can see the whole chain tracing with a clear call stack of your long-chain app.
  • You can check metrics for each span.

Learn more

In today’s fast-paced world of software development, staying ahead of service disruptions is crucial. The recent introduction of  LLMs promises to revolutionize business innovation, efficiency, productivity, and user experience.

It is essential to recognize that these cutting-edge models contain billions of parameters, making them incredibly intricate and sometimes challenging to troubleshoot.

This is where enterprise observability comes into play. By providing comprehensive monitoring and visibility, observability solutions help businesses navigate the complexities of modern technology, including LLM, to ensure smooth operations and optimal performance.

You can collect and analyze traces and metrics from TraceLoop today. Reach out to your Instana account representative to get early access to our curated dashboard for GenAI Runtimes.

Experience Instana’s observability innovations first-hand. Sign up for a free trial today and spend less time debugging and more time building.

LLM Observability with Instana and OpenLLMetry Feature: Add IBM watsonx.ai Instrumentation IBM Instana 

More from Automation

Deployable architecture on IBM Cloud: Simplifying system deployment

3 min read - Deployable architecture (DA) refers to a specific design pattern or approach that allows an application or system to be easily deployed and managed across various environments. A deployable architecture involves components, modules and dependencies in a way that allows for seamless deployment and makes it easy for developers and operations teams to quickly deploy new features and updates to the system, without requiring extensive manual intervention. There are several key characteristics of a deployable architecture, which include: Automation: Deployable architecture…

Understanding glue records and Dedicated DNS

3 min read - Domain name system (DNS) resolution is an iterative process where a recursive resolver attempts to look up a domain name using a hierarchical resolution chain. First, the recursive resolver queries the root (.), which provides the nameservers for the top-level domain(TLD), e.g.com. Next, it queries the TLD nameservers, which provide the domain’s authoritative nameservers. Finally, the recursive resolver  queries those authoritative nameservers.   In many cases, we see domains delegated to nameservers inside their own domain, for instance, “example.com.” is delegated…

Using dig +trace to understand DNS resolution from start to finish

2 min read - The dig command is a powerful tool for troubleshooting queries and responses received from the Domain Name Service (DNS). It is installed by default on many operating systems, including Linux® and Mac OS X. It can be installed on Microsoft Windows as part of Cygwin.  One of the many things dig can do is to perform recursive DNS resolution and display all of the steps that it took in your terminal. This is extremely useful for understanding not only how the DNS…

IBM Newsletters

Get our newsletters and topic updates that deliver the latest thought leadership and insights on emerging trends.
Subscribe now More newsletters