WML for z/OS model utility API

WMLz provides two types of model utility to support different needs: Python model utility and Java™ model utility.

1. Python Model Utility

You can use the WML for z/OS Python Model Utility API (MLModelUtil) to save a Spark, Scikit-learn, or XGBoost model that is created and trained locally on your distributed platform.

a. Class wmlz.ml_model_util.MLModelUtil

An MLModelUtil instance is an utility used to save a Spark, Scikit-learn, or XGBoost model into a local file system. See Importing a model from file.

Methods

  • def __init__(self)

    Initializes self.

    
    from wmlz.ml_model_util import MLModelUtil
    util = MLModelUtil()
  • save(self, model, base_name, pipeline=None, training_data=None, training_target=None, feature_names=None, label_column_names=None, dmatrix=None)

    Saves a model and returns the absolute path of the model file on a local file system.

    Where

    • model specifies the type of a trained model to be saved on a local file system. Valid types include pyspark.ml.PipelineModel for a Spark model, xgboost.Booster for a XGBoost model, and a subclass of sklearn.base.BaseEstimator for a Scikit-learn model.
    • base_name is the file name (string) of the model to be saved. The model file must be a tar.gz type, but the extension itself can be ignored.
    • pipeline defines the pipeline (pyspark.ml.Pipeline) for creating and training a Spark model.
    • training_data specifies the subset of data used for training a Spark model (pyspark.sql.DataFrame) or a Scikit-learn model ( pandas.DataFrame, numpy.ndarray, or list). This parameter is optional.
    • training_target is the target data used for training a model. This parameter is optional.
    • feature_names specifies the feature names (numpy.ndarray or list) for the training data if the training_data parameter is specified with a numpy.ndarray or list type.
    • label_column_names specifies the label column names (numpy.ndarray or list) for the training data if the training_target parameter is specified with a numpy.ndarray or list type. This parameter is optional.
    • dmatrix specifies the training data and target data (xgb.DMatrix) for a XGBoost model.

Examples

The following examples show the ways that you can use the utility API to save a Spark, Scikit-learn, or XGBoost model on your local distributed system.

Example 1: Saving a locally trained Spark model:


import pandas as pd
import numpy as np

from sklearn import datasets
from pyspark.ml.feature import VectorAssembler
from pyspark.ml.classification import LogisticRegression
from pyspark.ml.pipeline import Pipeline
from pyspark.sql import SparkSession
from wmlz.ml_model_util import MLModelUtil

spark = (
    SparkSession.builder.master("local[*]")
    .appName("PySpark to Mleap example")
    .getOrCreate()
)

# iris dataset
iris = datasets.load_iris()

# pandas dataframe
pdf = pd.DataFrame(
    data=np.c_[iris["data"], iris["target"]],
    columns=["SepalLength", "SepalWidth", "PetalLength", "PetalWidth", "label"],
)

# spark dataframe
df = spark.createDataFrame(pdf)

# create model
assembler = (
    VectorAssembler()
    .setInputCols(["SepalLength", "SepalWidth", "PetalLength", "PetalWidth"])
    .setOutputCol("features")
)

lr = LogisticRegression(maxIter=10, regParam=0.01, labelCol="label")

# pipeline
stages = [assembler, lr]
pipeline = Pipeline(stages=stages)

# train
model = pipeline.fit(df)

# save model
util = MLModelUtil()
model_path = util.save(
    model, "./pyspark_example_model", pipeline=pipeline, training_data=df,
)

Example 2: Saving a locally trained Scikit-learn model that uses pandas dataframe with column names as input features:


import pandas as pd
import numpy as np

from sklearn import datasets
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline
from wmlz.ml_model_util import MLModelUtil

# iris dataset
iris = datasets.load_iris()

# pandas dataframe
pdf = pd.DataFrame(
    data=np.c_[iris["data"], iris["target"]],
    columns=["SepalLength", "SepalWidth", "PetalLength", "PetalWidth", "label"],
)

x_train, y_train = (
    pdf[["SepalLength", "SepalWidth", "PetalLength", "PetalWidth"]],
    pdf["label"],
)

# create model
lr = LogisticRegression()
pipeline = Pipeline([("lr", lr)])

# train
model = pipeline.fit(x_train, y_train)

# save model
util = MLModelUtil()
model_path = util.save(
    model, "./scikit_example_model", training_data=x_train, training_target=y_train
)

Example 3: Saving a locally trained Scikit-learn model that uses ndarray with f0, f1, f2, and f3 as input features:


from sklearn import datasets
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline
from wmlz.ml_model_util import MLModelUtil

# iris dataset
iris = datasets.load_iris()

# numpy array
x_train, y_train = iris["data"], iris["target"]

# create model
lr = LogisticRegression()
pipeline = Pipeline([("lr", lr)])

# train
model = pipeline.fit(x_train, y_train)

# save model
util = MLModelUtil()
model_path = util.save(
    model,
    "./scikit_example_model",
    training_data=x_train,
    training_target=y_train
)

Example 4: Saving a locally trained Scikit-learn model that uses numpy array as well as feature names and label column names:


from sklearn import datasets
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline
from wmlz.ml_model_util import MLModelUtil

# iris dataset
iris = datasets.load_iris()

# numpy array
x_train, y_train = iris["data"], iris["target"]
features = ["SepalLength", "SepalWidth", "PetalLength", "PetalWidth"]
label = ["label"]

# create model
lr = LogisticRegression()
pipeline = Pipeline([("lr", lr)])

# train
model = pipeline.fit(x_train, y_train)

# save model
util = MLModelUtil()
model_path = util.save(
    model,
    "./scikit_example_model",
    training_data=x_train,
    training_target=y_train,
    feature_names=features,
    label_column_names=label,
)

Example 5: Saving a locally trained XGBoost model:


import pandas as pd
import xgboost as xgb

from sklearn import datasets
from wmlz.ml_model_util import MLModelUtil

# iris dataset
iris = datasets.load_iris()

# pandas dataframe
data = pd.DataFrame(
    iris["data"], columns=["SepalLength", "SepalWidth", "PetalLength", "PetalWidth"]
)
label = pd.DataFrame(iris["target"])

# dmatrix
dtrain = xgb.DMatrix(data, label=label)

# create model & train
param = {"max_depth": 4, "eta": 1, "objective": "multi:softmax", "num_class": 3}
model = xgb.train(param, dtrain)

# save model
util = MLModelUtil()
model_path = util.save(
    model,
    "./xgboost_example_model",
    dmatrix=dtrain
)

2. Java Model Utility

You can use the WML for z/OS Java Model Utility API (MLModelUtil) to save a Watson Core Time Series model that is created and trained locally on your distributed platform.

a. com.ibm.analytics.wmlz.model_util.input.WatforeModelInput

The WatforeModelInput class is used to define the basic information of a saved model. It is used with the MLModelUtil class to save a forecasting model trained using Watson Core Time Series into the standard "MLz Model" format, which can be imported into WMLz.

Constructors

WatforeModelInput(String author, String modelName, String outputPath, String[] columns, ForecastingModel[] models, int predictionHorizon):

Creates a new instance of WatforeModelInput with the specified parameters.
  • author: A string representing the author or owner of the model.
  • modelName: A string representing the name of the model.
  • outputPath: A string representing the output path where the model will be saved.
  • columns: An array of strings representing the names of the columns associated with each forecasting model.
  • models: An array of ForecastingModel objects representing the forecasting models to be saved.
  • predictionHorizon: An integer representing the prediction horizon of the models.

b. com.ibm.analytics.wmlz.model_util.MLModelUtil

The MLModelUtil class is a utility class used to save a forecasting model trained with Watson Core Time Series into the MLz Model format. The MLz Model format is a standard format that can be imported into the WMLz for further deployment.

Methods

  • save(BaseModelInput input):

    Saves the Watson Core Time Series model input as an MLz Model. This method is typically called with an instance of WatforeModelInput.

  • Input specifies the instance of WatforeModelInput.

Examples

The following examples show the ways that you can use the utility API to save a watfore model on your local distributed system.

Example 1: Saving a locally trained watfore model:
package com.wmlz.watfore.model;

import com.ibm.analytics.wmlz.model_util.MLModelUtil;
import com.ibm.analytics.wmlz.model_util.input.WatforeModelInput;
import com.ibm.research.time_series.forecasting.PMException;
import com.ibm.research.time_series.forecasting.algorithms.BATS.BATSAlgorithm;
import com.ibm.research.time_series.forecasting.algorithms.BATS.BATSAlgorithmBuilder;
import com.ibm.research.time_series.forecasting.algorithms.IForecastingAlgorithm;
import com.ibm.research.time_series.forecasting.algorithms.arima.ARIMAAlgorithmBuilder;
import com.ibm.research.time_series.forecasting.models.ForecastingModel;
import com.ibm.research.time_series.forecasting.models.OnlineForecastingInterpolator;
import com.ibm.research.time_series.forecasting.timeseries.TimeUnits;
import com.ibm.research.time_series.forecasting.transformation.interpolate.IOnlineInterpolator;
import java.util.Random;

public class MLModelUtilSample {
    public static void main(String[] args) {
        try {

            // Create Model with ARIMAAlgorithmBuilder for memory
            IForecastingAlgorithm arimaAlg = new ARIMAAlgorithmBuilder()
                    .setMinAROrder(1)
                    .setMaxAROrder(3)
                    .setDifferenceOrder(1)
                    .setMinMAOrder(0)
                    .setMaxMAOrder(5)
                    .build();
            IOnlineInterpolator arimaInterp = new OnlineForecastingInterpolator(123);
            ForecastingModel fmARIMA = new ForecastingModel(arimaAlg, arimaInterp);

            // Create Model with BATSAlgorithmBuilder for CPU
            BATSAlgorithmBuilder batsBuilder = new BATSAlgorithmBuilder();
            BATSAlgorithm batsAlg = batsBuilder
                    .setInitializationVersion(BATSAlgorithm.InitializationVersion.Version2)
                    .setSamplesPerSeason(new int[] { 24 })
                    .setTrainingSampleCount(100)
                    .setBoxCox(true)
                    .build();
            IOnlineInterpolator batsInterp = new OnlineForecastingInterpolator(123);
            ForecastingModel fmBATS = new ForecastingModel(batsAlg, batsInterp);

            Random r = new Random();
            for (int i = 0;i < 100;i++) {
                // update the model once per iteration
                fmARIMA.updateModel(TimeUnits.Hours, i, r.nextDouble());
                fmBATS.updateModel(TimeUnits.Hours, i, r.nextDouble());
            }

            // Save the models in the MLz Model format.
            ForecastingModel[] models = new ForecastingModel[]{fmARIMA, fmBATS};
            String[] columns = new String[]{"COL1 ", "COL2"};
            int predictionHorizon = 24;

            String author = "Your Name";
            String modelName = "time_series_model";
            String outputPath = "./";

            WatforeModelInput watforeModelInput =  new WatforeModelInput(author, modelName, outputPath, columns, models, predictionHorizon);
            MLModelUtil.save(watforeModelInput);

        } catch (PMException e) {
            throw new RuntimeException(e);
        }
    }
}