Installing the Watson Machine Learning Accelerator service
A project administrator can install Watson Machine Learning Accelerator on IBM® Cloud Pak for Data.
- Permissions you need for this task
- You must be an administrator of the OpenShift® project (Kubernetes namespace) where you will deploy Watson Machine Learning Accelerator.
- Information you need to complete this task
-
- Watson Machine Learning Accelerator
needs only the
restricted
security context constraint (SCC). - Watson Machine Learning Accelerator must be installed in a project that is tethered to the project where Cloud Pak for Data is installed.
- Watson Machine Learning Accelerator requires the Cloud Pak for Data common core services.
- Watson Machine Learning Accelerator
uses the following storage classes. If you don't use these storage classes on your cluster, ensure
that you have a storage class with an equivalent definition:
- OpenShift Container
Storage:
ocs-storagecluster-cephfs
- IBM Spectrum®:
ibm-spectrum-scale-sc
- NFS:
managed-nfs-storage
- Portworx:
portworx-shared-gp3
- OpenShift Container
Storage:
- Watson Machine Learning Accelerator
needs only the
Before you begin
Ensure that the cluster meets the minimum requirements for installing Watson Machine Learning Accelerator. For details, see System requirements.
Additionally, ensure that a cluster administrator completed the required Pre-installation tasks for your environment. Specifically, verify that a cluster administrator completed the following tasks:
- Cloud Pak for Data is installed. For details, see Installing Cloud Pak for Data.
- The project where you plan to install Watson Machine Learning Accelerator exists.
- Install prerequisite services. See Prerequisite services.
- Install prerequisite operators. See Prerequisite operators.
- For environments that use a private container registry, such as air-gapped environments, the Watson Machine Learning Accelerator software images are mirrored to the private container registry.
For information about mirroring the software images to the private container registry, see Mirroring images to your container registry. After setting up private container registries, you can set additional container registry rules for Watson Machine Learning Accelerator, see Container mirror registry rules for Watson Machine Learning Accelerator.
- The cluster is configured to pull the Watson Machine Learning Accelerator software images. For details, see Configuring your cluster to pull images.
- The Watson Machine Learning Accelerator catalog source exists. For details, see Creating catalog sources.
- The Watson Machine Learning Accelerator operator subscription exists. For details, see Creating operator subscriptions.
- The node settings are adjusted for Watson Machine Learning Accelerator. For details, see Changing required node settings.
If these tasks are not complete, the Watson Machine Learning Accelerator installation will fail.
Prerequisite services
Before you install Watson Machine Learning Accelerator, ensure that the following services are installed and running:
- You must install the scheduling service. See Installing the scheduling service.
Prerequisite operators
After you install the required services, you must install the GPU Operator:
- For a cluster connected to the internet, complete the Installing the NVIDIA GPU Operator on OpenShift instructions.
- For an air-gapped cluster, complete the Installing the NVIDIA GPU Operator instructions.
- For POWER hardware with Red Hat OpenShift 4.8, see GPU Operator for POWER.
Procedure
Complete the following tasks to install Watson Machine Learning Accelerator:
Creating the Watson Machine Learning Accelerator add-on custom resource
- Log in to Red Hat® OpenShift Container Platform as a user with sufficient permissions to
complete the
task:
oc login OpenShift_URL:port
-
Create a Watson Machine Learning Accelerator add-on (
Wmla-add-on
) custom resource in the Cloud Pak for Data namespace:
When you create thecat << EOF | oc apply -f - apiVersion: spectrumcomputing.ibm.com/v1 kind: Wmla-add-on metadata: name: wmla namespace: cpd-instance #Specify the name of your Cloud Pak for Data namespace spec: version: "2.3.9" wmlaNamespace: wmla-ns #Specify the name of your Watson Machine Learning Accelerator namespace EOF
Wmla-add-on
custom resource, the following configmap objects are created in the Cloud Pak for Data namespace:wml-accelerator
wml-accelerator-connection-info-extension-xxxid
wmla-add-on-instance-name-wml-accelerator-instance-cm
Setting up a tethered project
You must update the IBM Cloud Pak for
Data
NamespaceScope Operator
and the Cloud Pak for Data control plane (ZenService
) to watch the
project where you plan to install Watson Machine Learning Accelerator.
- Permissions you need for this task
- You must be either:
- A cluster administrator
- An administrator of the following projects:
- The project where the IBM Cloud Pak® for Data platform
operator is
installed, either
ibm-common-services
orcpd-operators
- The project where Cloud Pak for Data is installed
- The project where you plan to install Watson Machine Learning Accelerator
- The project where the IBM Cloud Pak® for Data platform
operator is
installed, either
- When you need to complete this task
- You must complete this task if you plan to install Watson Machine Learning Accelerator in a tethered project, but the project is not yet tethered to the project where Cloud Pak for Data is installed.
To tether the project:
- Log in to Red Hat OpenShift Container Platform as a user with sufficient permissions to
complete the
task:
oc login OpenShift_URL:port
-
Enable the IBM Cloud Pak for Data platform operator and the IBM Cloud Pak foundational services operator to watch the project where you will install Watson Machine Learning Accelerator.
- Update the
IBM NamespaceScope Operator
in the IBM Cloud Pak for Data platform operator project to watch the project where you plan to install Watson Machine Learning Accelerator.Edit the
namespaceMembers
list to add the project where you plan to install Watson Machine Learning Accelerator.cat <<EOF |oc apply -f - apiVersion: operator.ibm.com/v1 kind: NamespaceScope metadata: name: cpd-operators namespace: cpd-operators # (Default) Replace with the Cloud Pak for Data platform operator project name spec: namespaceMembers: - cpd-operators # (Default) Replace with the Cloud Pak for Data platform operator project name - cpd-instance # Replace with the project where Cloud Pak for Data is installed - wmla-ns # Replace with the project where Watson Machine Learning Accelerator is installed EOF
The above change adds the Watson Machine Learning Accelerator namespace wmla-ns to the namespace-scope configmap object in the cpd-operators namespace.
- Update the
- Modify the
ZenService
custom resource to add atetheredNamespaces
entry that lists the project that you want to tether to the project where Cloud Pak for Data is installed:- Run the following command to get the name of the
ZenService
:oc get ZenService -n cpd-instance
By default, the custom resource name is
lite-cr
. - Run the following command to edit the
ZenService
custom resource:oc edit ZenService custom-resource-name
- Add the
tetheredNamespaces
entry to the custom resource:apiVersion: zen.cpd.ibm.com/v1 kind: ZenService metadata: name: lite-cr namespace: cpd-instance # The project where Cloud Pak for Data is installed spec: csNamespace: ibm-common-services version: 4.3.1 storageClass: RWX-storage-class # The RWX storage class you specified during installation zenCoreMetaDbStorageClass: RWO-storage-class # The RWO storage class you specified during installation tetheredNamespaces: # Add this entry - "wmla-ns" # Add the name of the Watson Machine Learning Accelerator project to tether cloudpakfordata: true
- Save your changes to the
ZenService
custom resource. For example, if you are usingvi
, hit esc and enter :wq
When you save the changes to theZenService
custom resource, the following actions occur:- The required
secret
andconfigmap
objects are copied from the Cloud Pak for Data project to the tethered project. (Only objects that include theicpdata_tether_resource: true
label are copied.) - The
cpd-admin-role
andcpd-viewer-role
roles are copied to the tethered project. - The required
rolebinding
objects are created. Therolebinding
objects grant thezen-admin-sa
andzen-editor-sa
security accounts in the Cloud Pak for Data project access to thecpd-admin-role
in the tethered project.
By adding the Watson Machine Learning Accelerator namespace as a tethered namespace, this enables Cloud Pak for Data functions like audit logging, monitoring and diagnostics.
- Run the following command to get the name of the
Installing the service
To install Watson Machine Learning Accelerator:
- Log in to Red Hat OpenShift Container Platform as a user with sufficient permissions to
complete the
task:
oc login OpenShift_URL:port
- Create a Wmla custom resource to install Watson Machine Learning Accelerator. Make sure to
specify the storage class name.
The recommended storage class names are described in Setting up shared persistent storage.
Important:- By creating a Wmla custom
resource with
spec.license.accept: true
, you are accepting the license terms for Watson Machine Learning Accelerator. You can find links to the relevant licenses in IBM Cloud Pak for Data License Information. - Use a storage class with attributes similar to the storage class described in the Service persistent storage requirements section of Storage requirements.
cat <<EOF |oc apply -f - apiVersion: spectrumcomputing.ibm.com/v1 kind: Wmla metadata: name: wmla # IMPORTANT: the instance name (for example: wmla) and Cloud Pak for Data namespace must be the same as the namespace used by the add-on (Wmla-add-on) custom resource created in the previous steps namespace: wmla-ns spec: license: # Specify the license you purchased accept: true license: Standard | Enterprise # Required. Specify "Standard" or "Enterprise" version: 2.3.9 acceptRollback: true cpdServiceInstance: "" # Specify the wmla add-on custom resoruce name here only if the wmla custom resource name is different from the wmla add-on custom resource name cpdNamespace: cpd-instance # Specify the namespace where Cloud Pak for Data is installed podSchedulerNamespace: ibm-cpd-scheduling # Specify the namespace where the scheduling service is installed storageClass: storage-class-name # Specify the storage class name: portworx-shared-gp3, managed-nfs-storage or ocs-storagecluster-cephfs. See the guidance in "Information you need to complete this task" EOF
- By creating a Wmla custom
resource with
Additional installation options
- Configure the serviceReplicas setting in the service CR by setting the
replica value to 1 or greater. This controls the number of pods used by core
services in Watson Machine Learning Accelerator.
- To disable multiple service replicas, set this value to 1.
- To enable multiple service replicas, set this value to 2. Setting this value greater than 2 increases the number of replicas.
spec: serviceReplicas: 1
- Configure the scaleConfig setting in the service CR. By default, the
service uses a small deployment. This value can be set to small, medium or large.
See details, Scaling services.
spec: scaleConfig: small
Verifying the installation
When you create the custom resource, the Watson Machine Learning Accelerator operator processes the contents of the custom resource and starts up the
microservices that comprise Watson Machine Learning Accelerator, including Wmla. (The Wmla microservice is defined by the wmla custom
resource.) Watson Machine Learning Accelerator is installed when the Wmla status is Completed
.
To check the status of the installation:
- Change to the project where you installed Watson Machine Learning Accelerator:
oc project wmla-ns
- Get the status of Watson Machine Learning Accelerator (wmla):
oc get Wmla wmla -o jsonpath='{.status.wmlaStatus} {"\n"}'
Watson Machine Learning Accelerator is ready when the command returns Completed
What to do next
- To connect your Watson™ Machine Learning Accelerator service to the Watson Machine Learning service, see Connecting Watson Machine Learning Accelerator to Watson Machine Learning.
- Add users to the Watson Machine Learning Accelerator instance and provide access to the service console, see Add users to the Watson Machine Learning Accelerator instance.
- Configure monitoring resource usage for Watson Machine Learning Accelerator, see Resource usage.