Table of contents

Preparing the batch processing environment in IBM Analytics Engine

To configure Watson OpenScale to work in batch mode ensure that you have all of the required services and components. There are specific requirements for the IBM Analytics Engine.

Requirements for IBM Analytics Engine

In addition to the usual requirements for running Watson OpenScale, batch processing requires the following assets:

  • An instance of IBM Analytics Engine. When you install IBM Analytics Engine, update your custom resource (CR) definition to include the following lines:
    serviceConfig:
      sparkAdvEnabled: "true"
    
  • An additional volume that is different from the default volume
  • A Cloud Pak for Data API key that provides permissions to write to the volume (This API key is the platform API key that enables Watson OpenScale to submit jobs and not write files to the volume.)
  • Apache Hive database, requires Apache Hive 2.3.7
  • Specialized notebooks to configure generation for HIVE or for JDBC that you run in tandem iteratively with Watson OpenScale
  • A feedback table, training data table, and a payload logging table that you create in the Hive database
  • A drifted_transaction table that stores the transactions that are used for post-processing analysis

Step 1: Create a Python archive of modules that are required by Watson OpenScale

Watson OpenScale jobs that would run on the Hadoop Ecosystem require a few dependent python packages. Without these packages the jobs would fail. Refer to this section on how to install these dependencies and upload them to a location in HDFS

  1. Log in to a Linux operating system where Python is installed and check the version of Python by running the following command:

    python --version
    Python 3.7.9
    
  2. Install the python3-devel package, which installs the GCC libraires, by running the following command:

    yum install python3-devel
    
  3. Navigate to the directory where the Python virtual environment is created, such as the /opt folder:

    cd /opt

  4. Delete any previously-created virtual environment by running the following command. (In the following example, the wos_env folder contains a Python virtual environment with Watson OpenScale Spark Job dependencies in it.)

    rm -fr wos_env
    rm -fr wos_env.zip
    
  5. Create a virtual environment. In the following example, the name of the environment is wos_env. After you create it, source the virtual environment.

    python -m venv wos_env
    source wos_env/bin/activate
    
  6. Upgrade the pip environment, by running the following command:

    pip install --upgrade pip

  7. To install all the dependencies, choose whether to install them individually or as part of a batch process in a file. They must be installed in the following order.

    • If you’re using Python 3.7, run the following commands in order to install the required files one at a time:

       python -m pip install numpy==1.20.2
       python -m pip install scipy==1.6.3
       python -m pip install pandas==1.2.4
       python -m pip install scikit-learn==0.24.2
       python -m pip install osqp==0.6.1
       python -m pip install cvxpy==1.0.25
       python -m pip install marshmallow==3.11.1
       python -m pip install requests==2.25.1
       python -m pip install jenkspy==0.2.0
       python -m pip install pyparsing==2.4.7
       python -m pip install tqdm==4.60.0
       python -m pip install more_itertools==8.7.0
       python -m pip install tabulate==0.8.9
       python -m pip install py4j==0.10.9.2
       python -m pip install pyarrow==4.0.0
       python -m pip install ibm-wos-utils>4.0.0
      
    • If you’re using Python 3.6, run the following commands in order to install the required files one at a time:

       python -m pip install numpy==1.19.5
       python -m pip install scipy==1.5.4
       python -m pip install scikit-learn==0.24.2
       python -m pip install osqp==0.6.1
       python -m pip install cvxpy==1.0.25
       python -m pip install marshmallow==3.11.1
       python -m pip install requests==2.25.1
       python -m pip install jenkspy==0.2.0
       python -m pip install pyparsing==2.4.7
       python -m pip install tqdm==4.60.0
       python -m pip install more_itertools==8.7.0
       python -m pip install tabulate==0.8.9
       python -m pip install py4j==0.10.9.2
       python -m pip install pyarrow==4.0.0
       python -m pip install ibm-wos-utils>4.0.0
      

      Rather than run each command one by one, you can put all modules into a requirements.txt file and run the command just once.

    • If you’re using Python 3.7, create the requirements.txt file by adding the following lines to the file. Then, run the python -m pip install -r requirements.txt command:

       numpy==1.20.2
       scipy==1.6.3
       pandas==1.2.4
       scikit-learn==0.24.2
       osqp==0.6.1
       cvxpy==1.0.25
       marshmallow==3.11.1
       requests==2.25.1
       jenkspy==0.2.0
       pyparsing==2.4.7
       tqdm==4.60.0
       more_itertools==8.7.0
       tabulate==0.8.9
       py4j==0.10.9.2
       pyarrow==4.0.0
       ibm-wos-utils>4.0.0
      
    • If you’re using Python 3.6, create the requirements.txt file by adding the following lines to the file. Next, run the python -m pip install numpy==1.19.5 command and then run the python -m pip install -r requirements.txt command:

       scipy==1.5.4
       pandas==1.1.5
       scikit-learn==0.24.2
       osqp==0.6.1
       cvxpy==1.0.25
       marshmallow==3.11.1
       requests==2.25.1
       jenkspy==0.2.0
       pyparsing==2.4.7
       tqdm==4.60.0
       more_itertools==8.7.0
       tabulate==0.8.9
       py4j==0.10.9.2
       pyarrow==4.0.0
       ibm-wos-utils>4.0.0
      
  8. Deactivate the virtual environment by running the deactivate command.

Step 2: Upload the archive

You must compress the virtual environment and upload the archive file to the file system or volume mount. For IBM Analytics Engine, you must build the wos_env.zip file by using Python 3.7.

  1. Create a zip file that contains the virtual environment by running the following command:

    zip -q -r wos_env.zip wos_env/
    ls -alt --block-size=M wos_env.zip
    
  2. Generate a platform token by running the following command:

    curl -k -u <user_name>:<password> https://<cluster_url>/v1/preauth/validateAuth
    
  3. Upload to a volume using the PUT /volumes API command:

    curl -k -i -X PUT 'https://<cluster_url>/zen-volumes/<volume_name>/v1/volumes/files/wos_packages' -H "Authorization: Bearer <token>"  -H 'cache-control: no-cache' -H 'content-type: multipart/form-data' -F  'upFile=@/<path_to_parent_dir>/wos_env.zip'
    

Next steps

You are now ready to configure the batch processor. For more information, see Configuring the batch processor.