Enabling event-driven automatic scaling on GPUs for an instance of IBM Software Hub

If you install the Red Hat® OpenShift® Custom Metrics Autoscaler, you can enable event-driven automatic scaling based on inferencing requests. To enable event-driven scaling, you must configure the custom metrics autoscaler that is running in the instance project to query OpenShift Container Platform metrics.

Installation phase
  • You are not here. Setting up a client workstation
  • You are not here. Setting up a cluster
  • You are not here. Collecting required information
  • You are not here. Preparing to run installs in a restricted network
  • You are not here. Preparing to run installs from a private container registry
  • You are not here. Preparing the cluster for IBM Software Hub
  • You are here icon. Preparing to install an instance of IBM Software Hub
  • You are not here. Installing an instance of IBM Software Hub
  • You are not here. Setting up the control plane
  • You are not here. Installing solutions and services
Who needs to complete this task?

Cluster administrator A cluster administrator must complete this task.

When do you need to complete this task?

This task is optional. Complete this task if the following statements are true:

  • You want to enable event-driven scaling on GPU for this instance of IBM Software Hub.
  • You plan to install one or more of the following serviced in this instance of IBM Software Hub:
    • IBM Knowledge Catalog Premium *
    • IBM Knowledge Catalog Standard *
    • Watson Speech services *
    • watsonx.ai™
    • watsonx Assistant *
    • Watsonx BI
    • watsonx Code Assistant™
    • watsonx Code Assistant for Red Hat Ansible® Lightspeed
    • watsonx Code Assistant for Z Agentic
    • watsonx Code Assistant for Z Understand
    • watsonx.data™ Premium
    • watsonx.data integration *
    • watsonx.data intelligence *
    • watsonx™ Orchestrate *

    An asterisk (*) indicates that the service uses Inference foundation models in some situations.

Repeat as needed Repeat this task for each instance of IBM Software Hub where the preceding statements are true.

Before you begin

If you want to enable event-driven scaling based on inference requests, you must install the Red Hat OpenShift Custom Metrics Autoscaler.

Best practice: You can run the commands in this task exactly as written if you set up environment variables. For instructions, see Setting up installation environment variables.

Ensure that you source the environment variables before you run the commands in this task.

About this task

To configure the custom metrics autoscaler that is running in the instance project to query OpenShift Container Platform metrics, you must:
  • Create a service account
  • Create a role
  • Bind the service account to the role
  • Create a trigger authentication for the service account token
Important: To use event-driven automatic scaling with an instance of IBM Software Hub, you must also create a scaled object for each model for which you want to support automatic scaling. For more information, see Configuring event-driven scaling for models.

Procedure

To configure the custom metrics autoscaler to use OpenShift Container Platform metrics:

  1. Create the keda-thanos-access service account in the instance project:
    cat << EOF | oc apply -f -
    apiVersion: v1
    kind: ServiceAccount
    metadata:
      name: keda-thanos-access
      namespace: ${PROJECT_CPD_INST_OPERANDS}
    EOF
  2. Create the keda-thanos-role role for the service account:
    cat << EOF | oc apply -f -
    apiVersion: rbac.authorization.k8s.io/v1
    kind: Role
    metadata:
      name: keda-thanos-role
      namespace: ${PROJECT_CPD_INST_OPERANDS}
    rules:
    - apiGroups:
      - ""
      resources:
      - pods
      verbs:
      - get
    - apiGroups:
      - metrics.k8s.io
      resources:
      - pods
      - nodes
      verbs:
      - get
      - list
      - watch
    EOF
  3. Bind the role to the keda-thanos-access service account:
    cat << EOF | oc apply -f -
    apiVersion: rbac.authorization.k8s.io/v1
    kind: RoleBinding
    metadata:
      name: keda-thanos-rolebinding
      namespace: ${PROJECT_CPD_INST_OPERANDS}
    roleRef:
      apiGroup: rbac.authorization.k8s.io
      kind: Role
      name: keda-thanos-role
    subjects:
    - kind: ServiceAccount
      name: keda-thanos-access
      namespace: ${PROJECT_CPD_INST_OPERANDS}
    EOF
  4. Create a trigger authentication for the keda-thanos-access service account token:
    cat << EOF | oc apply -f -
    apiVersion: keda.sh/v1alpha1
    kind: TriggerAuthentication
    metadata:
      name: keda-thanos-auth
      namespace: ${PROJECT_CPD_INST_OPERANDS}
    spec:
    
      boundServiceAccountToken: 
        - parameter: bearerToken 
          serviceAccountName: keda-thanos-account
    EOF
  5. If your network policies prevent ingress to the instance project from Prometheus pods in the openshift-user-workload-monitoring project, create the following network policy:
    cat << EOF | oc apply -f -
    apiVersion: projectcalico.org/v3
    kind: NetworkPolicy
    metadata:
      name: allow-ingress-from-prometheus-to-router
      namespace: ${PROJECT_CPD_INST_OPERANDS}
    spec:
      # Destination: all pods within namespace
      selector: projectcalico.org/namespace == '${PROJECT_CPD_INST_OPERANDS}'
      order: 10200
      ingress:
        - action: Allow
          protocol: TCP
          source:
            # Source: prometheus
            namespaceSelector: name == "openshift-user-workload-monitoring"
            selector: app.kubernetes.io/component == "prometheus"
          destination:
            ports:
            - 19092
    EOF

What to do next

Now that you've enabled event-driven automatic scaling for the instance, you're ready to complete Creating secrets for services that use Multicloud Object Gateway.