Getting started with Rapids cuDF and cuML

Find information about getting started with Rapids cudf and cuml.

This release of WML CE has cudf 0.7.2 and cuml 0.7.0 and the corresponding conda package names are cudf and cuml. These packages are installed automatically when pai4sk or powerai is installed. These packages are supported only on Python 3.6. More information on Rapids can be found at https://rapids.ai/index.html.

CUDF

In this release, the ability to pass a cuDF dataframe into pai4sk APIs is supported. This will result in both data preparation and training of the model done solely on GPU.

To make use of cuDF with pai4sk APIs, follow the steps,

import cudf
from cudf.dataframe import DataFrame

df_trainX = DataFrame.from_pandas(pdf_trainX)
df_trainY = DataFrame.from_pandas(pdf_trainY)

# data used for training
# Create a C-contiguous DeviceNDArray from cuDF
from pai4sk.sml_io import copy_as_gpu_cmatrix
X_train = copy_as_gpu_cmatrix(df_trainX)
y_train = copy_as_gpu_cmatrix(df_trainY)

from pai4sk import LogisticRegression
lr = LogisticRegression(use_gpu=True)

lr.fit(X_train, y_train)

Currently, DeviceNDArray as input is supported in pai4sk for the following APIs:

Example programs for each of these APIs are provided as part of the conda package. To find out how to run the sample programs, refer to the README placed under $CONDA_PREFIX/pai4sk/local-examples/cudf-examples.

CUML

The cuml APIs can be directly used in a python program.

Some of the APIs of pai4sk are modified to use cuML APIs if cuml conda package is installed. This module will automatically fall back to original scikit-learn behavior when cuML does not provide the necessary support. The following links are the list of such APIs:

Example programs for each of these APIs are provided as part of the conda package. To find out how to run the sample programs, refer to the README placed under the subdirectories of $CONDA_PREFIX/pai4sk/local-examples/cuml-examples.