Monitoring OpenAI models

OpenAI provides transformer-based language models that enable natural language understanding and generation. This guide shows you how to instrument an application using OpenAI models with OpenLLMetry to send telemetry data to Instana.

Prerequisites

Make sure that the following prerequisites are met:

Instrumenting your OpenAI application

  1. Install the required packages.

    pip install openai traceloop-sdk
  2. Export your OpenAI API key.

    export OPENAI_API_KEY="your-openai-api-key>"
     
  3. Create your OpenAI application. Create a Python file with the following code:

    import os
    from openai import OpenAI
    from traceloop.sdk import Traceloop
    from traceloop.sdk.decorators import workflow
    
    # Initialize OpenAI client
    client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
    
    # Initialize OpenLLMetry
    Traceloop.init(app_name="openai_chat_service", disable_batch=True)
    
    @workflow(name="openai_conversation")
    def ask_openai(question: str):
        """Send a question to OpenAI and get a response."""
    
        response = client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[{"role": "user", "content": question}]
        )
    
        return response.choices[0].message.content
    
    # Example usage
    if __name__ == "__main__":
        questions = [
            "What is AIOps and how does it help with IT operations?",
            "Explain the benefits of observability in modern applications."
        ]
    
        for question in questions:
            print(f"\nQuestion: {question}")
            answer = ask_openai(question)
            print(f"Answer: {answer}\n")
            print("-" * 80)
  4. Run your application.

    python3 openai_app.py

    The application will send questions to OpenAI and display the responses. OpenLLMetry automatically captures traces for each API call and sends them to Instana.

  5. View data on Instana.

    After running your application, the following items are displayed on the Instana Gen AI observability dashboard:

    • Model used
    • Token usage (input and output tokens)
    • Response latency
    • Request and response content

Using streaming responses

For real-time response streaming, use the streaming API:

@workflow(name="openai_streaming")
def ask_openai_streaming(question: str):
    """Stream responses from OpenAI in real-time."""

    stream = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": question}],
        stream=True
    )

    for chunk in stream:
        if chunk.choices[0].delta.content:
            print(chunk.choices[0].delta.content, end="", flush=True)

    print()  # New line after streaming completes

Troubleshooting

For common issues such as traces not appearing or connection errors, see Troubleshooting.

Authentication errors

If you encounter authentication errors:

  1. Verify your OPENAI_API_KEY is set correctly
  2. Check whether your API key is valid in the OpenAI Platform
  3. Make sure that your API key is not expired or revoked
  4. Verify your account has sufficient credits

Rate limiting errors

If you encounter rate limit errors:

  1. Check your OpenAI account's rate limits
  2. Add delays between requests if making multiple calls
  3. Consider upgrading your OpenAI plan for higher limits
  4. Implement exponential backoff for retries

Model not found errors

If you encounter model not found errors:

  1. Verify the model name is correct (for example, gpt-4o-mini, gpt-4o, gpt-3.5-turbo)
  2. Check whether your API key has access to the specified model
  3. Make sure the model is available in your region
  4. Refer to OpenAI's model documentation for available models

Next steps