Configuring event-driven scaling for models
If the custom metrics autoscaler is configured for this instance of IBM Software Hub, you can create scaled objects to enable event-driven scaling of model replicas. Event-driven scaling enables the cluster to automatically scale model replicas on existing GPU nodes in response to inferencing requests.
- Permissions that you need for this task
- You must be either:
- A cluster administrator
- An instance administrator
- When you need to complete this task
- This task is optional. Complete this task only if you want to allow the cluster to scale model replicas in response to inferencing requests.
Before you begin
Ensure that you source the environment variables before you run the commands in this task.
About this task
If you use Inference foundation models to start and host models, you can configure event-driven automatic scaling for models that support business critical tasks.
If you host multiple models, ensure that your scaled objects will not prevent other models from creating pods. If you don't have sufficient GPU to support the maximum number of replicas, some pods will be pending until GPU resources are available.
To scale a model, you must know the model name. For a complete list of models, see GPU requirements for models.