Enabling event-driven automatic scaling on GPUs for an instance of IBM Software Hub

If you install the Red Hat® OpenShift® Custom Metrics Autoscaler, you can enable event-driven automatic scaling based on inferencing requests. To enable event-driven scaling, you must configure the custom metrics autoscaler that is running in the instance project to query OpenShift Container Platform metrics.

Installation phase

Setting up a client workstation
Setting up a cluster
Collecting required information
Preparing to run installs in a restricted network
Preparing to run installs from a private container registry
Preparing the cluster for IBM Software Hub
Preparing to install an instance of IBM Software Hub
Installing an instance of IBM Software Hub
Setting up the control plane
Installing solutions and services

Who needs to complete this task?

Cluster administrator A cluster administrator must complete this task.

When do you need to complete this task?

This task is optional. Complete this task if the following statements are true:

You want to enable event-driven scaling on GPU for this instance of IBM Software Hub.
You plan to install one or more of the following serviced in this instance of IBM Software Hub:
- IBM Knowledge Catalog Premium *
- IBM Knowledge Catalog Standard *
- Watson Speech services *
- watsonx.ai™
- watsonx Assistant *
- Watsonx BI
- watsonx Code Assistant™
- watsonx Code Assistant for Red Hat Ansible® Lightspeed
- watsonx Code Assistant for Z Agentic
- watsonx Code Assistant for Z Understand
- watsonx.data™ Premium
- watsonx.data integration *
- watsonx.data intelligence *
- watsonx™ Orchestrate *
An asterisk (*) indicates that the service uses Inference foundation models in some situations.

Repeat as needed Repeat this task for each instance of IBM Software Hub where the preceding statements are true.

Before you begin

If you want to enable event-driven scaling based on inference requests, you must install the Red Hat OpenShift Custom Metrics Autoscaler.

Best practice: You can run the commands in this task exactly as written if you set up environment variables. For instructions, see Setting up installation environment variables.

Ensure that you source the environment variables before you run the commands in this task.

About this task

To configure the custom metrics autoscaler that is running in the instance project to query OpenShift Container Platform metrics, you must:

Create a service account
Create a role
Bind the service account to the role
Create a trigger authentication for the service account token

Important: To use event-driven automatic scaling with an instance of IBM Software Hub, you must also create a scaled object for each model for which you want to support automatic scaling. For more information, see Configuring event-driven scaling for models.

Procedure

To configure the custom metrics autoscaler to use OpenShift Container Platform metrics:

Create the keda-thanos-access service account in the instance project:

cat << EOF | oc apply -f -
apiVersion: v1
kind: ServiceAccount
metadata:
  name: keda-thanos-access
  namespace: ${PROJECT_CPD_INST_OPERANDS}
EOF

Create the keda-thanos-role role for the service account:

cat << EOF | oc apply -f -
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: keda-thanos-role
  namespace: ${PROJECT_CPD_INST_OPERANDS}
rules:
- apiGroups:
  - ""
  resources:
  - pods
  verbs:
  - get
- apiGroups:
  - metrics.k8s.io
  resources:
  - pods
  - nodes
  verbs:
  - get
  - list
  - watch
EOF

Bind the role to the keda-thanos-access service account:

cat << EOF | oc apply -f -
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: keda-thanos-rolebinding
  namespace: ${PROJECT_CPD_INST_OPERANDS}
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: keda-thanos-role
subjects:
- kind: ServiceAccount
  name: keda-thanos-access
  namespace: ${PROJECT_CPD_INST_OPERANDS}
EOF

Create a trigger authentication for the keda-thanos-access service account token:

cat << EOF | oc apply -f -
apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
  name: keda-thanos-auth
  namespace: ${PROJECT_CPD_INST_OPERANDS}
spec:

  boundServiceAccountToken: 
    - parameter: bearerToken 
      serviceAccountName: keda-thanos-account
EOF

If your network policies prevent ingress to the instance project from Prometheus pods in the openshift-user-workload-monitoring project, create the following network policy:

cat << EOF | oc apply -f -
apiVersion: projectcalico.org/v3
kind: NetworkPolicy
metadata:
  name: allow-ingress-from-prometheus-to-router
  namespace: ${PROJECT_CPD_INST_OPERANDS}
spec:
  # Destination: all pods within namespace
  selector: projectcalico.org/namespace == '${PROJECT_CPD_INST_OPERANDS}'
  order: 10200
  ingress:
    - action: Allow
      protocol: TCP
      source:
        # Source: prometheus
        namespaceSelector: name == "openshift-user-workload-monitoring"
        selector: app.kubernetes.io/component == "prometheus"
      destination:
        ports:
        - 19092
EOF

What to do next

Now that you've enabled event-driven automatic scaling for the instance, you're ready to complete Creating secrets for services that use Multicloud Object Gateway.