Supported algorithms, data sources, data types, and model types
IBM Watson® Machine Learning for z/OS supports various machine learning algorithms, data sources, data types, and model types that you can use to create, train, and deploy models.
Algorithms
- All classification and regression algorithms that the Apache Spark MLlib supports. See z/OS Spark MLLib – Classification and regression 3.2 for a list of the supported classification and regression algorithms.
- All PySpark classification and regression algorithms that the Apache Spark MLlib supports.
- All clustering algorithms that the Apache Spark MLlib supports. See z/OS Spark MLLib – Clustering for a list of the supported clustering algorithms.
- All PySpark clustering algorithms that the Apache Spark MLlib supports.
- All Scikit-learn machine learning algorithms. See Scikit-learn machine learning algorithms for the list of supported Scikit-learn machine learning algorithms.
- All machine learning algorithms that XGBoost Python API supports, with exception of GPU algorithms in XGBoost. See XGBoost Python Package for details of supported XGBoost algorithms.
Data sources
Data source support in WML for z/OS is determined by whether you use JDBC or IBM® Data Virtualization Manager for z/OS (DVM) as the data access method:
- With JDBC, WML for z/OS supports access to the following data
source in Scala and Python:
- Db2® for z/OS
For example, you can use the Scala code in the following example to connect through JDBC to the TENTDATA table in the Notebook Editor:
val df = spark.read.format("jdbc").options(Map( "driver" -> "com.ibm.db2.jcc.DB2Driver", "url" -> "jdbc:db2://<url>:<port>/<location>", "user" -> "<userid>", "password" -> "<password>", "dbtable" -> "MLZ.TENTDATA")).load()
Where url and port are the IP address and port number of your Db2 host system, location is the location of your Db2 installation, and userid and password are your Db2 authorization ID and password.
You can also use the Python code in the following example to connect through PySpark to the TENTDATA table:
import pandas # Import libraries required for reading data files from pyspark import SparkContext from pyspark.sql import SQLContext sc = SparkContext.getOrCreate() # Initialize SparkSQL Context sqlContext = SQLContext(sc) df = sqlContext.read.format("jdbc").options(driver= 'com.ibm.db2.jcc.DB2Driver',url='jdbc:db2:// <url>:<port>/<location>', user='<userid>', password='<password>', dbtable='MLZ.TENTDATA). load().toPandas() print(df.head(5))
Set <url>, <port>, <location>, <userid> and <password> to appropriate values based on the Db2 installation in your environment.
- With MDS (z/OS Data Service), WML for z/OS supports access to the following data
sources in Scala and Python:
- Db2 for z/OS
- IMS
- SMF
- VSAM data sets
For example, you can use the Scala code in the following example to connect through DVM to the TENTDATA table in the Notebook Editor:
val data = spark.read.format("jdbc").options(Map( "driver" -> "com.rs.jdbc.dv.DvDriver", "url" -> "jdbc:rs:dv://<url>:<port>", "user" -> "<userid>", "password" -> "<password>", "dbty" -> "DVS", "dbtable" -> "MLZ.TENTDATA")).load()
You can also use the Python code in the following example to connect through PySpark to the DVM data source:
import pandas # Import libraries required for reading data files from pyspark import SparkContext from pyspark.sql import SQLContext sc = SparkContext.getOrCreate() # Initialize SparkSQL Context sqlContext = SQLContext(sc) df = sqlContext.read.format("jdbc").options(driver= com.rs.jdbc.dv.DvDriver',url= jdbc:rs:dv://<url>:<port>', user='<userid>', password='<password>', dbty='DVS', dbtable='MLZ.TENTDATA). load().toPandas() print(df.head(5))
Finally, you can ingest data from Db2 for z/OS and DVM sources by adding data asset connections into your notebook.
See IBM Data Virtualization Manager for z/OS for more information about working with data sources through DVM.
Data types
WML for z/OS supports all the data types that IBM z/OS Platform for Apache Spark supports.
Model types
The type of a machine learning model is determined by the scoring engine used for processing the model. WML for z/OS supports the following model types:
- SparkML
- MLeap
- PMML
- Scikit-learn
- XGBoost
- ARIMA or Seasonal ARIMA
- ONNX
You can train, save, and deploy MLeap, SparkML, Scikit-learn, XGBoost, ARIMA, and Seasonal ARIMA models in WML for z/OS:
- If you create a model by using the Scala Notebook Editor, you can save the model as a SparkML, MLeap, or SparkML/MLeap model type. You can deploy the model as a SparkML or MLeap model.
- If you create a model by using the Python Notebook Editor, you can save and deploy the model as a Scikit-learn or XGBoost model. While a native XGBoost model is saved and deployed as a XGBoost, a XGBoost Scikit-learn wrapper model is saved and deployed as a Scikit-learn model.
- If you create a time series model by using the Python
Notebook Editor, you can save and deploy it as an
ARIMA or Seasonal ARIMA model type. Make sure that you use the
repository SAVE API to save the model and specify the corresponding time series scoring engine
(
Time Series-Arima
orTime Series-Seasonal Arima
) for deployment.
You can also use the WML for z/OS to import a PMML, ONNX, or Watson Core Time Series model that is developed on another platform. You can save and deploy the imported model as a PMML, ONNX, or watfore model type.
The WML for z/OS support of the model types has certain limitations:
- Limitations in the models used for CICS and WOLA scoring:
- The field names in the model’s input and output should not exceed 16 characters in length due to the limitation of COBOL and the COPYBOOK generator.
- Limitations in the MLeap and SparkML engine:
- The MLeap engine does not support the
following z/OS Spark transformers or estimators:
- org.apache.spark.ml.feature.PolynomialExpansion
- org.apache.spark.ml.feature.OneHotEncoderModel
- org.apache.spark.ml.feature.OneHotEncoder (for Spark 2.3.0 or later)
- org.apache.spark.ml.feature.Normalizer
- org.apache.spark.ml.feature.RobustScalerModel
- org.apache.spark.ml.feature.SQLTransformer
- org.apache.spark.ml.feature.VectorSizeHint
- org.apache.spark.ml.feature.ImputerModel
- org.apache.spark.ml.feature.RFormulaModel
- org.apache.spark.ml.feature.UnivariateFeatureSelectorModel
- org.apache.spark.ml.feature.VarianceThresholdSelectorModel
- org.apache.spark.ml.feature.BucketedRandomProjectionLSHModel
- org.apache.spark.ml.feature.MinHashLSHModel
- org.apache.spark.ml.classification.FMClassificationModel
- org.apache.spark.ml.regression.FMRegressionModel
- org.apache.spark.ml.clustering.LDAModel
- org.apache.spark.ml.clustering.PowerIterationClustering
- org.apache.spark.ml.classification.MultilayerPerceptronClassificationModel
- org.apache.spark.ml.tuning.CrossValidatorModel
- org.apache.spark.ml.tuning.TrainValidationSplitModel
If a pipeline contains any of these transformers or estimators, the pipeline will not be retained in the MLeap bundle, and the model will be saved as a SparkML type only.
- The MLeap engine does not support any customized z/OS Spark transformers or estimators except the CADS (Cognitive Assistant for Data Scientists) estimator.
- The MLeap engine does not support any model that has array, vector, map, or struct as a column data type. The model will be saved as a SparkML type only.
- You cannot create a PMML model in WML for z/OS, but you can import and deploy a PMML model that you've already created elsewhere.
- The MLeap engine does not support the
following z/OS Spark transformers or estimators:
- Limitations in the Scikit-learn engine:
- The Scikit-learn engine does not support the feedback evaluation, retraining, and batch scoring of a model.
- The Scikit-learn engine does not support a
model if it does not contain a
predict
method, such asSpectralBiclustering
andAgglomerativeClustering
.
- Limitations in the XGBoost engine:
- The XGBoost engine does not support feedback evaluation and batch scoring of a model.
- Using a XGBoost model with objective
count:poisson
for prediction may cause errors. - WML for z/OS supports the conversion of a
XGBoost model to PMML only if the XGBoost model has the following parameters:
- booster
- gbtree
- objective
- binary:logistic
- count:poisson
- multi:softmax
- multi:softprob
- reg:gamma
- reg:logistic
- reg:linear
- reg:tweedie
- booster
- You can use the XGBoost Scikit-learn wrapper API within a Scikit-learn pipeline to train a model. If you want to convert the pipeline to PMML, all the operations in the pipeline must be supported by JPMML-SkLearn. See JPMML-SkLearn for details.
- Limitations in the ONNX engine:
- You cannot create an ONNX model in WML for z/OS.
- The ONNX engine does not support feedback evaluation and retraining of a ONNX model.
- The ONNX engine does not support batch scoring of a ONNX model that reads a batch of records from a database and writes a batch of predictions back to the database.
The ONNX engines supports ONNX version 1.13.1 for operations targeting up to opset 18. See Supported ONNX Operation for Target cpu and Supported ONNX Operation for Target NNPA for more information about the supported operators and limitations.
- Limitations in the watfore engine:
- Only Watson Core Time Series v2.14.2 is supported.
- The supported algorithms within the Watson Core Time Series forecasting model are limited to those that can be constructed using the ForecastingModel.