Deploying IBM watsonx.data capabilities
The deployment of IBM watsonx.data On-premises is made easy by the combination of IBM Storage Fusion HCI System and IBM Storage Ceph, which provide all of the infrastructure that you need for your stand-alone data lakehouse.
- Data sharing and self-service access
- Data caching to accelerate IBM watsonx.data performance.
- Local S3 object store.
- IBM Storage Ceph provides an external S3 object store for IBM watsonx.data. This S3 object store can be the main S3 object store for IBM watsonx.data, or alongside S3 object store with other on-premise or public cloud object stores. The IBM Storage Ceph object storage interface (Ceph Object Gateway) is compatible with a large subset of the Amazon S3 RESTful API. For more information about IBM Storage Ceph, see https://www.ibm.com/docs/en/storage-ceph/6.
- Ability to include GPU servers in the watsonx solution
These high-level instructions show how to prepare IBM Storage Fusion HCI System so that you can install IBM watsonx.data and make use of buckets with storage acceleration.
There are three main steps in this process:
For detailed information about planning, implementation, monitoring, and backup and restore, see Accelerating IBM watsonx.data with IBM Storage Fusion HCI System Redpaper.
-
Upgrade IBM Storage Fusion HCI System to OpenShift® Container Platform 4.12.
The 2.6.1 version of Fusion HCI System comes with OpenShift Container Platform version 4.10. However, you must upgrade OpenShift Container Platform to version 4.12 to make use of the storage acceleration feature.
-
Configure the Multicloud Object Gateway that provides access to accelerated buckets.
The Multicloud Object Gateway (MCG) provides an object endpoint that IBM watsonx.data and other workloads can connect to access multiple buckets, including storage acceleration buckets. The Multicloud Object Gateway is provided by the Red Hat® OpenShift Data Foundation operator.
Install the Red Hat OpenShift Data Foundation operator into Red Hat OpenShift Data Foundation.
- Search for Red Hat OpenShift Data Foundation in OperatorHub.
- Install the Red Hat OpenShift Data Foundation operator.
- Follow the instructions to Deploy standalone Multicloud Object Gateway. When you are
prompted for a StorageClass in step 3 of section 3.3, use the
ibm-storage-fusion-cp-scStorageClass that is configured by default in IBM Storage Fusion HCI System.
- Configure Advanced File Management (AFM nodes) in the HCI appliance to enable storage
acceleration.
IBM Storage Fusion HCI System uses special Advanced File Management (AFM) nodes to connect to external object storage and provide data caching that is used for storage acceleration. Install these AFM nodes in the IBM Storage Fusion HCI System. You must have already installed them as a part of setting up the physical appliance.
To use, add the AFM nodes to the OpenShift cluster. Follow the Upsizing nodes instructions to add the AFM nodes to the OpenShift cluster and configure them for storage.
Attach a storage accelerated bucket to awatsonx.datacatalog:- Create an AFM fileset and attach it to a bucket.
- Expose the accelerated bucket through the Multicloud Gateway.
- Connect the accelerated bucket to a catalog.
- Log in to
watsonx.dataconsole. - From the navigation menu, select Infrastructure Manager.
- To define and connect a bucket, click Add component and select Add bucket.
-
In the Add bucket window, provide the details to connect to existing externally managed object storage.
- Log in to