WML for z/OS model utility API
WMLz provides two types of model utility to support different needs: Python model utility and Java™ model utility.
1. Python Model Utility
You can use the WML for z/OS Python Model Utility API (MLModelUtil) to save a Spark, Scikit-learn, or XGBoost model that is created and trained locally on your distributed platform.
a. Class wmlz.ml_model_util.MLModelUtil
An MLModelUtil
instance is an utility used to save a Spark, Scikit-learn, or XGBoost model into a local file system. See Importing a model from file.
Methods
def __init__(self)
Initializes self.
from wmlz.ml_model_util import MLModelUtil util = MLModelUtil()
save(self, model, base_name, pipeline=None, training_data=None, training_target=None, feature_names=None, label_column_names=None, dmatrix=None)
Saves a model and returns the absolute path of the model file on a local file system.
Where
- model specifies the type of a trained model to be saved on a local file
system. Valid types include
pyspark.ml.PipelineModel
for a Spark model,xgboost.Booster
for a XGBoost model, and a subclass ofsklearn.base.BaseEstimator
for a Scikit-learn model. - base_name is the file name (string) of the model to be saved. The model file
must be a
tar.gz
type, but the extension itself can be ignored. - pipeline defines the pipeline (
pyspark.ml.Pipeline
) for creating and training a Spark model. - training_data specifies the subset of data used for training a Spark model (
pyspark.sql.DataFrame
) or a Scikit-learn model (pandas.DataFrame
,numpy.ndarray
, or list). This parameter is optional. - training_target is the target data used for training a model. This parameter is optional.
- feature_names specifies the feature names (
numpy.ndarray
or list) for the training data if thetraining_data
parameter is specified with anumpy.ndarray
orlist
type. - label_column_names specifies the label column names
(
numpy.ndarray
or list) for the training data if thetraining_target
parameter is specified with anumpy.ndarray
orlist
type. This parameter is optional. - dmatrix specifies the training data and target data
(
xgb.DMatrix
) for a XGBoost model.
- model specifies the type of a trained model to be saved on a local file
system. Valid types include
Examples
The following examples show the ways that you can use the utility API to save a Spark, Scikit-learn, or XGBoost model on your local distributed system.
Example 1: Saving a locally trained Spark model:
import pandas as pd
import numpy as np
from sklearn import datasets
from pyspark.ml.feature import VectorAssembler
from pyspark.ml.classification import LogisticRegression
from pyspark.ml.pipeline import Pipeline
from pyspark.sql import SparkSession
from wmlz.ml_model_util import MLModelUtil
spark = (
SparkSession.builder.master("local[*]")
.appName("PySpark to Mleap example")
.getOrCreate()
)
# iris dataset
iris = datasets.load_iris()
# pandas dataframe
pdf = pd.DataFrame(
data=np.c_[iris["data"], iris["target"]],
columns=["SepalLength", "SepalWidth", "PetalLength", "PetalWidth", "label"],
)
# spark dataframe
df = spark.createDataFrame(pdf)
# create model
assembler = (
VectorAssembler()
.setInputCols(["SepalLength", "SepalWidth", "PetalLength", "PetalWidth"])
.setOutputCol("features")
)
lr = LogisticRegression(maxIter=10, regParam=0.01, labelCol="label")
# pipeline
stages = [assembler, lr]
pipeline = Pipeline(stages=stages)
# train
model = pipeline.fit(df)
# save model
util = MLModelUtil()
model_path = util.save(
model, "./pyspark_example_model", pipeline=pipeline, training_data=df,
)
Example 2: Saving a locally trained Scikit-learn model that uses pandas dataframe with column names as input features:
import pandas as pd
import numpy as np
from sklearn import datasets
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline
from wmlz.ml_model_util import MLModelUtil
# iris dataset
iris = datasets.load_iris()
# pandas dataframe
pdf = pd.DataFrame(
data=np.c_[iris["data"], iris["target"]],
columns=["SepalLength", "SepalWidth", "PetalLength", "PetalWidth", "label"],
)
x_train, y_train = (
pdf[["SepalLength", "SepalWidth", "PetalLength", "PetalWidth"]],
pdf["label"],
)
# create model
lr = LogisticRegression()
pipeline = Pipeline([("lr", lr)])
# train
model = pipeline.fit(x_train, y_train)
# save model
util = MLModelUtil()
model_path = util.save(
model, "./scikit_example_model", training_data=x_train, training_target=y_train
)
Example 3: Saving a locally trained Scikit-learn model that uses ndarray with
f0
, f1
, f2
, and f3
as input
features:
from sklearn import datasets
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline
from wmlz.ml_model_util import MLModelUtil
# iris dataset
iris = datasets.load_iris()
# numpy array
x_train, y_train = iris["data"], iris["target"]
# create model
lr = LogisticRegression()
pipeline = Pipeline([("lr", lr)])
# train
model = pipeline.fit(x_train, y_train)
# save model
util = MLModelUtil()
model_path = util.save(
model,
"./scikit_example_model",
training_data=x_train,
training_target=y_train
)
Example 4: Saving a locally trained Scikit-learn model that uses numpy array as well as feature names and label column names:
from sklearn import datasets
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline
from wmlz.ml_model_util import MLModelUtil
# iris dataset
iris = datasets.load_iris()
# numpy array
x_train, y_train = iris["data"], iris["target"]
features = ["SepalLength", "SepalWidth", "PetalLength", "PetalWidth"]
label = ["label"]
# create model
lr = LogisticRegression()
pipeline = Pipeline([("lr", lr)])
# train
model = pipeline.fit(x_train, y_train)
# save model
util = MLModelUtil()
model_path = util.save(
model,
"./scikit_example_model",
training_data=x_train,
training_target=y_train,
feature_names=features,
label_column_names=label,
)
Example 5: Saving a locally trained XGBoost model:
import pandas as pd
import xgboost as xgb
from sklearn import datasets
from wmlz.ml_model_util import MLModelUtil
# iris dataset
iris = datasets.load_iris()
# pandas dataframe
data = pd.DataFrame(
iris["data"], columns=["SepalLength", "SepalWidth", "PetalLength", "PetalWidth"]
)
label = pd.DataFrame(iris["target"])
# dmatrix
dtrain = xgb.DMatrix(data, label=label)
# create model & train
param = {"max_depth": 4, "eta": 1, "objective": "multi:softmax", "num_class": 3}
model = xgb.train(param, dtrain)
# save model
util = MLModelUtil()
model_path = util.save(
model,
"./xgboost_example_model",
dmatrix=dtrain
)
2. Java Model Utility
You can use the WML for z/OS Java Model Utility API (MLModelUtil) to save a Watson Core Time Series model that is created and trained locally on your distributed platform.
a. com.ibm.analytics.wmlz.model_util.input.WatforeModelInput
The WatforeModelInput class is used to define the basic information of a saved model. It is used with the MLModelUtil class to save a forecasting model trained using Watson Core Time Series into the standard "MLz Model" format, which can be imported into WMLz.
Constructors
WatforeModelInput(String author, String modelName,
String outputPath, String[] columns,
ForecastingModel[] models, int predictionHorizon):
WatforeModelInput
with the specified parameters.- author: A string representing the author or owner of the model.
- modelName: A string representing the name of the model.
- outputPath: A string representing the output path where the model will be saved.
- columns: An array of strings representing the names of the columns associated with each forecasting model.
- models: An array of
ForecastingModel
objects representing the forecasting models to be saved. - predictionHorizon: An integer representing the prediction horizon of the models.
b. com.ibm.analytics.wmlz.model_util.MLModelUtil
The MLModelUtil class is a utility class used to save a forecasting model trained with Watson Core Time Series into the MLz Model format. The MLz Model format is a standard format that can be imported into the WMLz for further deployment.
Methods
save(BaseModelInput input):
Saves the Watson Core Time Series model input as an MLz Model. This method is typically called with an instance of WatforeModelInput.
Input
specifies the instance of WatforeModelInput.
Examples
The following examples show the ways that you can use the utility API to save a watfore model on your local distributed system.
package com.wmlz.watfore.model;
import com.ibm.analytics.wmlz.model_util.MLModelUtil;
import com.ibm.analytics.wmlz.model_util.input.WatforeModelInput;
import com.ibm.research.time_series.forecasting.PMException;
import com.ibm.research.time_series.forecasting.algorithms.BATS.BATSAlgorithm;
import com.ibm.research.time_series.forecasting.algorithms.BATS.BATSAlgorithmBuilder;
import com.ibm.research.time_series.forecasting.algorithms.IForecastingAlgorithm;
import com.ibm.research.time_series.forecasting.algorithms.arima.ARIMAAlgorithmBuilder;
import com.ibm.research.time_series.forecasting.models.ForecastingModel;
import com.ibm.research.time_series.forecasting.models.OnlineForecastingInterpolator;
import com.ibm.research.time_series.forecasting.timeseries.TimeUnits;
import com.ibm.research.time_series.forecasting.transformation.interpolate.IOnlineInterpolator;
import java.util.Random;
public class MLModelUtilSample {
public static void main(String[] args) {
try {
// Create Model with ARIMAAlgorithmBuilder for memory
IForecastingAlgorithm arimaAlg = new ARIMAAlgorithmBuilder()
.setMinAROrder(1)
.setMaxAROrder(3)
.setDifferenceOrder(1)
.setMinMAOrder(0)
.setMaxMAOrder(5)
.build();
IOnlineInterpolator arimaInterp = new OnlineForecastingInterpolator(123);
ForecastingModel fmARIMA = new ForecastingModel(arimaAlg, arimaInterp);
// Create Model with BATSAlgorithmBuilder for CPU
BATSAlgorithmBuilder batsBuilder = new BATSAlgorithmBuilder();
BATSAlgorithm batsAlg = batsBuilder
.setInitializationVersion(BATSAlgorithm.InitializationVersion.Version2)
.setSamplesPerSeason(new int[] { 24 })
.setTrainingSampleCount(100)
.setBoxCox(true)
.build();
IOnlineInterpolator batsInterp = new OnlineForecastingInterpolator(123);
ForecastingModel fmBATS = new ForecastingModel(batsAlg, batsInterp);
Random r = new Random();
for (int i = 0;i < 100;i++) {
// update the model once per iteration
fmARIMA.updateModel(TimeUnits.Hours, i, r.nextDouble());
fmBATS.updateModel(TimeUnits.Hours, i, r.nextDouble());
}
// Save the models in the MLz Model format.
ForecastingModel[] models = new ForecastingModel[]{fmARIMA, fmBATS};
String[] columns = new String[]{"COL1 ", "COL2"};
int predictionHorizon = 24;
String author = "Your Name";
String modelName = "time_series_model";
String outputPath = "./";
WatforeModelInput watforeModelInput = new WatforeModelInput(author, modelName, outputPath, columns, models, predictionHorizon);
MLModelUtil.save(watforeModelInput);
} catch (PMException e) {
throw new RuntimeException(e);
}
}
}