AI gateway

Instana supports connectivity to watsonx.ai or other external LLM services with LLM gateways, enabling various generative AI (gen AI) use cases. By configuring LLM gateways, you gain control over the connections and LLMs powering Instana's gen AI capabilities, ensuring alignment with internal policies, performance requirements, and data governance standards. You can create, monitor, configure, and maintain all LLM gateways within your environment.

Deployment options

SaaS

On SaaS environments, Instana provides default LLM gateways that connect automatically to watsonx.ai to leverage LLMs without additional configuration. Additionally, you can create an LLM gateway to connect to your own watsonx.ai runtime.

Self-hosted

On self-hosted environments (Standard or Custom edition), you can connect to your own watsonx.ai runtime or a vLLM-based inferencing service.

Supported LLM services

You can create LLM gateways to the following LLM services:

  • watsonx.ai (IBM Cloud)
  • watsonx.ai (self-hosted on OpenShift with Cloud Pak for data (Instana version 1.0.313 and later))
  • vLLM-compatible inferencing services, including the Red Hat AI Inferencing Server (RHAIIS) (Public preview)

Generative AI capabilities

The following are the supported generative AI capabilities in Instana that can be configured for use with an LLM gateway:

  • AI assistant
  • Incident summarization
  • Incident investigation
  • Kubernetes AI assistant
  • Manual action generation
  • Script generation
  • SLO assistant
  • GenAI evaluation

Managing user permissions

The following user permission is required to access all Gen AI capabilities on the Instana UI:

  • Access AI gateway: Enables read-only access to the AI gateway UI
  • Create, configure, and delete LLM gateways : Enables full access to the AI gateway UI
  • Access all Gen AI capabilities: Enables access to AI-powered features in Instana UI

For more information, see Managing user access.

Connecting Instana to an LLM gateway

To set up and configure an LLM gateway, complete the following steps.

  1. Set up a connection.

    1. Click AI gateway in the navigation pane. The LLM gateways pane is displayed.
    2. Click Create LLM gateway. The Create an LLM gateway wizard is displayed.
    3. Select Set up a connection.
    4. In Connection settings, select one of the following services that is used to access LLM gateways:
      • IBM watsonx: If you configure watsonx, enter the following details:
        • watsonx API key: The API key for authenticating access to the watsonx service.
        • watsonx project: A name for the project.
        • watsonx URL: The URL for the watsonx service.
      • vLLM: If you configure an external vLLM-based (virtual large language model) service, enter the following details:
        • Endpoint URL: The URL where the vLLM service is hosted.
        • Endpoint API key (Optional): The authentication key for accessing the endpoint securely.
    5. To verify the configuration, click Test Connection.
      • For IBM watsonx: If the connection fails, verify that the API key, project ID, and URL are correct and valid. A successful test confirms that Instana can communicate with the watsonx service.
      • For vLLM: If the connection fails, verify that the endpoint URL is accessible and the API key is correct. A successful test confirms that Instana can communicate with the vLLM service.
    6. Click Next.
    Note: The Kubernetes AI assistant capability is not supported for the vLLM service.
  2. Select capability and AI model.

    1. In the Select capability and AI model section, select one of the following capabilities:
      • AI assistant
      • Manual action generation
      • Incident summarization
      • Incident investigation
      • Script generation
      • SLO assistant
      • GenAI evaluation
    2. Select an AI model:
      • Granite: General-purpose model
      • Mistral (Medium): Balanced performance
      • Mistral (Large): High-capacity for complex tasks
      • Openai/gpt-oss-120b: For tasks that require analytics and insights. (SLO assistant requires openai/gpt-oss-120b model)
    3. Click Next to continue.
  3. Configure model.

    1. Set the following parameters to fine-tune model behavior:
      • Token limit (in thousands): Maximum tokens per request (for example, 100).
      • Max latency: Maximum response time (for example, 1 second).
      • Repetition penalty: Discourages repeated phrases (for example, 1).
      • Temperature: Controls randomness (for example, 0.2 to 1).
      • top_k: Number of tokens to sample (for example, 50).
      • top_p: Cumulative probability threshold (for example, 0.5).
    2. Click Next to proceed or Back to revise.
  4. Enter gateway details.

    1. Name: Enter a unique name for the gateway.
    2. Description: Enter a description for the gateway.
  5. Click Create to finalize the setup.

Viewing LLM gateways

To see the LLM gateways that you've configured, click AI gateway in the navigation pane. The LLM gateways table is displayed and shows all configured gateways and their statuses. You can enable or disable them and view configuration details.

Parameter Description
Name The name of the gateway configuration. Each gateway serves a specific gen AI use case.
Capability The gen AI function that the gateway supports (for example, script generation, action generation, incident summarization).
Type Indicates whether the gateway is a default system configuration or user-defined.
Service used The LLM service powering the gateway (for example, IBM watsonx).
AI model The LLM model used for inference (for example, Granite, Mistral).
Status Shows whether the gateway is active (Enabled) or inactive (Disabled).
Endpoint URL The URL endpoint through which the gateway communicates with the AI service.

Testing existing gateways

After you create an LLM gateway, you can test the connection to verify proper operation.

To test an existing gateway, complete the following steps:

  1. From the navigation menu in the Instana UI, select AI gateway.
  2. In the LLM gateways table, click Test for the gateway you want to test. The test result shows whether the connection is working correctly.