Monitoring vLLM
vLLM is an open source framework for optimizing and accelerating large language model (LLM) inference. Monitor your vLLM infrastructure with Instana by collecting performance metrics to gain visibility into resource utilization, throughput, latency, and overall system health of your LLM deployments.
Prerequisites
-
Python 3.10 or later
-
vLLM 0.6.3.post1 or later
-
Access to Instana agent or backend
Configuring environment variables
Configure your environment to export traces to Instana using either agent mode or agentless mode.
Agent mode
Export traces through an Instana agent:
export OTEL_EXPORTER_OTLP_TRACES_ENDPOINT="http://<instana-agent-host>:4317"
export OTEL_SERVICE_NAME="<your-service-name>"
Agentless mode
Export traces directly to the Instana backend:
export OTEL_EXPORTER_OTLP_TRACES_ENDPOINT="https://<otlp-acceptor-domain>:4317"
export OTEL_EXPORTER_OTLP_HEADERS="x-instana-key=<agent-key>"
export OTEL_SERVICE_NAME="<your-service-name>"
For endpoints without TLS, set:
export OTEL_EXPORTER_OTLP_INSECURE=true
For Instana backend endpoints, see Endpoints of the Instana backend otlp-acceptor.
Installing dependencies
-
Verify Python version:
python3 -V -
(Optional) Create a virtual environment:
pip3 install virtualenv virtualenv vllm-env source vllm-env/bin/activate -
Install vLLM:
pip3 install vllm==0.6.3.post1For alternative installation methods, see the vLLM documentation.
-
Install OpenTelemetry packages:
pip3 install 'opentelemetry-sdk>=1.26.0,<1.27.0' 'opentelemetry-api>=1.26.0,<1.27.0' 'opentelemetry-exporter-otlp>=1.26.0,<1.27.0' 'opentelemetry-semantic-conventions-ai>=0.4.1,<0.5.0'
Instrumenting vLLM applications
Starting the vLLM server
Deploy vLLM as a separate server with API endpoints. If you want to observe the client application, you can try out the sample client implementation provided below.
Start the vLLM server:
vllm serve ibm-granite/granite-3.0-2b-instruct --otlp-traces-endpoint="$OTEL_EXPORTER_OTLP_TRACES_ENDPOINT"
Creating a client application
Create a client application to send requests to the vLLM server:
import requests
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor, ConsoleSpanExporter
from opentelemetry.trace import SpanKind, set_tracer_provider
from opentelemetry.trace.propagation.tracecontext import TraceContextTextMapPropagator
# Configure OpenTelemetry
trace_provider = TracerProvider()
set_tracer_provider(trace_provider)
trace_provider.add_span_processor(BatchSpanProcessor(OTLPSpanExporter()))
trace_provider.add_span_processor(BatchSpanProcessor(ConsoleSpanExporter()))
tracer = trace_provider.get_tracer("vllm-client")
vllm_url = "http://localhost:8000/v1/completions"
with tracer.start_as_current_span("client-span", kind=SpanKind.CLIENT) as span:
prompt = "San Francisco is a"
span.set_attribute("prompt", prompt)
headers = {}
TraceContextTextMapPropagator().inject(headers)
payload = {
"model": "ibm-granite/granite-3.0-2b-instruct",
"prompt": prompt,
"max_tokens": 10,
"n": 1,
"best_of": 1,
"use_beam_search": "false",
"temperature": 0.0,
}
response = requests.post(vllm_url, headers=headers, json=payload)
print(response)
Viewing traces in Instana
After running your application, data will appear in the Instana Gen AI observability dashboard. You can analyze calls by application, service, and endpoint, and view logs alongside trace information.
For more details on viewing and analyzing traces, see Viewing telemetry data.
Collecting metrics from vLLM
You can collect metrics from vLLM using one of the following options:
Option 1: Using OTel data collector for vLLM (ODCV)
ODCV collects OpenTelemetry metrics from vLLM based on predefined semantic conventions.
Prerequisites
-
Java SDK 11 or later
Verify installation:
java -version
Procedure
-
Download the latest ODCV package:
wget https://github.com/instana/otel-dc/releases/download/v1.0.7/otel-dc-vllm-1.0.0.tar -
Extract the package:
tar xf otel-dc-vllm-1.0.0.tar cd otel-dc-vllm-1.0.0 -
Configure
config/config.yaml:vi config/config.yamlUpdate these fields:
-
otel.agentless.mode: Connection mode (truefor agentless,falsefor agent) -
otel.backend.url: gRPC endpoint for Instana backend or agent -
callback.interval: Data posting interval in seconds -
otel.service.name: Name for the Data Collector -
otel.vllm.metrics.url: vLLM server metrics endpoint (usehttpsif TLS enabled)
For agentless mode: (
otel.agentless.mode: true):-
otel.backend.url: Backend otlp-acceptor endpoint (for example,http://<instana-otlp-endpoint>:4317) -
Use
https://for TLS-enabled SaaS endpoints
For agent mode: (
otel.agentless.mode: false):-
otel.backend.url: Instana agent endpoint (for example,http://<instana-agent-host>:4317)
-
-
(Optional) Configure logging:
vi config/logging.properties -
Run the Data Collector:
nohup ./bin/otel-dc-vllm >/dev/null 2>&1 &Alternatively, use
tmuxorscreenfor background execution.
Option 2: Using open source OpenTelemetry collector
You can forward OpenTelemetry data from vLLM to an Instana agent or Instana backend by using the OpenTelemetry collector.
You can use one of the following OpenTelemetry collectors:
-
Instana distribution of OpenTelemetry collector (IDOT) - A preconfigured, optimized distribution that simplifies vLLM monitoring setup. For more information, see IDOT and OpenTelemetry project
-
An open source OpenTelemetry collector - The core component of the OpenTelemetry ecosystem, offering vendor-independent functions for telemetry data collection, processing, and export. For custom deployments or environments where IDOT is not available. For more information, see OpenTelemetry collector.
Instana supports the following models to collect data:
-
Agent model - Data is sent to the Instana agent first, and the agent helps aggregate the data and send to the Instana backend.
-
Agentless model - Data is sent to the Instana backend directly without going through the agent.
Forwarding telemetry data to an Instana agent (agent model)
The following snippet shows a typical configuration for the OpenTelemetry Collector to forward telemetry data to a local Instana host agent by using the OTLP/HTTP protocol. Create a YAML file, such as config.yaml, as follows:
receivers:
otlp/receiver:
protocols:
grpc:
prometheus:
config:
scrape_configs:
- job_name: 'prometheus-scrape'
scrape_interval: 10s
static_configs:
- targets:
- "$(PROMETHEUS_METRICS_ENDPOINT)"
metrics_path: '/metrics'
processors:
batch: {}
resource:
attributes:
- key: service.name
value: vllm
action: insert
- key: service.namespace
value: genai
action: insert
- key: service.instance.id
value: <vllm-host>
action: insert
- key: server.address
value: <vllm-host-address>
action: insert
- key: server.port
value: "8000"
action: insert
- key: vllm.entity.type
value: vllm
action: insert
- key: INSTANA_PLUGIN
value: vllm
action: insert
exporters:
otlphttp:
endpoint: "http://$(INSTANA_AGENT_HOST):4318"
tls:
insecure: true
service:
pipelines:
metrics/prometheus:
receivers: [prometheus]
processors: [resource, batch]
exporters: [otlphttp]
PROMETHEUS_METRICS_ENDPOINT represents the endpoint from where vLLM metrics will be scraped. For example, the value is 9.60.233.77:8000.
The following example shows a typical configuration of the OpenTelemetry Collector for forwarding Telemetry data to a local Instana host agent with the OTLP/grpc protocol.
exporters:
otlp:
endpoint: "$(INSTANA_AGENT_HOST):4317"
tls:
insecure: true
-
Set the PROMETHEUS_METRICS_ENDPOINT field with the vLLM metrics exporter endpoint.
-
Set the INSTANA_AGENT_HOST field with the IP or the host of the Instana agent to connect to.
-
Instana uses OTLP standard port numbers, such as 4317 for OTLP/gRPC and 4318 for OTLP/HTTP.
After you complete all configuration changes in the config.yaml file, run the following command to use the OpenTelemetry Collector:
docker run -d -p 4317:4317 -v $(pwd)/config.yaml:/etc/otelcol-contrib/config.yaml otel/opentelemetry-collector-contrib:latest
Forwarding telemetry data to the Instana backend (agentless model)
To forward OpenTelemetry data to the Instana backend by using the OpenTelemetry Collector, complete the following steps:
-
Create a YAML file, such as config.yaml same as in the preceding section. Change the endpoint from Instana agent endpoint to Instana backend endpoint. The special endpoints of the backend otlp-acceptor component are used when the OpenTelemetry data is sent. The Instana backend requires an Instana agent key for validation.
exporters: otlp: endpoint: INSTANA_OTLP_GRPC_BACKEND:4317 headers: x-instana-key: xxxxxxx x-instana-host: xxxxNote:-
Set the
INSTANA_OTLP_GRPC_BACKENDfield with the correct domain name of theotlp-acceptorcomponent of the Instana backend. For more information about the endpoint of the Instana backendotlp-acceptor, see Endpoints of Self-Hosted Instana backend otlp-acceptor or Endpoints of SaaS Instana backend otlp-acceptor -
Set the
x-instana-keyfield with the agent key of the Instana agent for targeting the Instana backend. To find your agent key, you can click More > Agents in the navigation bar of the Instana UI and then click Install Agents > Windows. -
Set the
x-instana-hostfield with the host ID if nohost.id,faas.id, ordevice.idresource attribute is defined in your application or system. -
Instana uses OTLP standard port numbers, such as 4317 for
OTLP/gRPCand 4318 forOTLP/HTTP. Port 443 is also supported forOTLP/HTTP.
-
-
After you complete all configuration changes in the
config.yamlfile, run the following command to use the OpenTelemetry Collector:docker run -d -p 4317:4317 -v $(pwd)/config.yaml:/etc/otelcol-contrib/config.yaml otel/opentelemetry-collector-contrib:latest
Viewing vLLM metrics in Instana
After configuring metrics collection using either option:
-
In Instana UI, go to Infrastructure > Analyze Infrastructure
-
Select OTel vLLMonitor from entity types
-
Click an OTel vLLMonitor instance to view its dashboard
Troubleshooting
Infomation on troubleshooting SSL handshake errors.
SSL handshake errors
If you encounter SSL errors such as:
Handshake failed with fatal error SSL_ERROR_SSL: error:100000f7:SSL routines:OPENSSL_internal:WRONG_VERSION_NUMBER.
Set the following to export data without TLS:
export OTEL_EXPORTER_OTLP_INSECURE=true
Next steps
-
Configure alerts for vLLM performance metrics
-
Explore trace correlation between client and server
-
Review OpenTelemetry for advanced configurations