Supported algorithms, data sources, data types, and model types

IBM Watson® Machine Learning for z/OS supports various machine learning algorithms, data sources, data types, and model types that you can use to create, train, and deploy models.

Algorithms

Algorithm support in WML for z/OS varies depending on whether you use the integrated Notebook Editor or SPSS® Modeler as the model development tool:

  • The integrated Notebook Editor supports the following model algorithms:
    • All classification and regression algorithms that the Apache Spark MLlib supports. See "z/OS Spark MLLib – Classification and regression 2.3 or 2.4" for a list of the supported classification and regression algorithms.
    • All PySpark classification and regression algorithms that the Apache Spark MLlib supports.
    • All clustering algorithms that the Apache Spark MLlib supports. See "z/OS Spark MLLib – Clustering 2.3 or 2.4" for a list of the supported clustering algorithms.
    • All PySpark clustering algorithms that the Apache Spark MLlib supports.
    • All Scikit-learn machine learning algorithms. See Scikit-learn machine learning algorithms for the list of supported Scikit-learn machine learning algorithms.
    • All machine learning algorithms that XGBoost Python API supports, with exception of GPU algorithms in XGBoost. See XGBoost Python Package for details of supported XGBoost algorithms.
  • The integrated SPSS Modeler supports the following model algorithms:
    • Anomaly
    • Apriori
    • Association Rules
    • Auto Classifier
    • Auto Numeric
    • Bayes Net
    • C5.0
    • C&R Tree
    • CHAID
    • Cox
    • Decision List
    • Discriminant
    • Feature Selection
    • GenLin
    • GLE
    • Isotonic-AS
    • K-Means
    • K-Means-AS
    • Kohonen
    • KNN
    • Linear
    • Linear-AS
    • Logistic
    • LSVM
    • Neural Net
    • One-Class SVM*
    • PCA/Factor
    • Quest
    • Random Forest*
    • Random Trees
    • Regression
    • Sequence
    • SVM
    • Time Series
    • Tree-AS
    • TwoStep
    • TwoStep-AS
    • XGBoost-AS
    • XGBoost Linear*
    • XGBoost Tree*

    * The modeler currently does not support the One-Class SVM, Random Forest, XGBoost Linear, or XGBoost Tree algorithm when running on Linux® on Z.

Data sources

Data source support in WML for z/OS is determined by whether you use JDBC or MDS as the data access method:

  • With JDBC, WML for z/OS supports access to the following data source in Scala and Python:
    • Db2® for z/OS

    For example, you can use the Scala code in the following example to connect through JDBC to the TENTDATA table in the Notebook Editor:

    
    val df = spark.read.format("jdbc").options(Map(
        "driver" -> "com.ibm.db2.jcc.DB2Driver",
        "url" -> "jdbc:db2://<url>:<port>/<location>",
        "user" -> "<userid>", 
        "password" -> "<password>", 
        "dbtable" -> "MLZ.TENTDATA")).load()
    

    Where url and port are the IP address and port number of your Db2 host system, location is the location of your Db2 installation, and userid and password are your Db2 authorization ID and password.

    You can also use the Python code in the following example to connect through PySpark to the TENTDATA table:

    
    import pandas
    # Import libraries required for reading data files
    from pyspark import SparkContext
    from pyspark.sql import SQLContext
        
    sc = SparkContext.getOrCreate()
    # Initialize SparkSQL Context
    sqlContext = SQLContext(sc)
    
    df = sqlContext.read.format("jdbc").options(driver=
         'com.ibm.db2.jcc.DB2Driver',url='jdbc:db2://
          <url>:<port>/<location>', user='<userid>', 
          password='<password>', dbtable='MLZ.TENTDATA).
          load().toPandas()
    
    print(df.head(5))

    Set <url>, <port>, <location>, <userid> and <password> to appropriate values based on the Db2 installation in your environment.

  • With MDS (z/OS Data Service), WML for z/OS supports access to the following data sources in Scala and Python:
    • Db2 for z/OS
    • IMS
    • SMF
    • VSAM data sets

    For example, you can use the Scala code in the following example to connect through MDS to the TENTDATA table in the Notebook Editor:

    
    val data = spark.read.format("jdbc").options(Map(
         "driver" -> "com.rs.jdbc.dv.DvDriver",
         "url" -> "jdbc:rs:dv://<url>:<port>",
         "user" -> "<userid>",
         "password" -> "<password>", 
         "dbty" -> "DVS", "dbtable" -> "MLZ.TENTDATA")).load()

    You can also use the Python code in the following example to an MDS data source:

    
    import dsdbc
    import pandas as pd
    conn = dsdbc.connect(SSID="<ssid>", 
         USER="<userid>", PASSWORD="<password>")
    sql = "SELECT * FROM AZKSQL.TENTDATA_SEQ"
    tentdata_df = pd.read_sql(sql, conn)
    tentdata_df.head(10)

    Finally, you can ingest data from Db2 for z/OS and MDS sources by adding data asset connections into your notebook. To add data asset in the Notebook Editor, insert a project token, find and add data, choose a connection, and create a dataframe with project context.

    See The Data Service SQL solution and DS Studio Overview for more information about working with data sources through MDS.

Data types

WML for z/OS supports all the data types that z/OS IzODA Spark supports. See SQL DMF supported data types for details.

Model types

The type of a machine learning model is determined by the scoring engine used for processing the model. WML for z/OS supports the following model types:

  • SparkML
  • MLeap
  • PMML
  • Scikit-learn
  • XGBoost
  • ARIMA or Seasonal ARIMA
  • ONNX

You can train, save, publish, and deploy MLeap, SparkML, Scikit-learn, XGBoost, ARIMA, and Seasonal ARIMA models in WML for z/OS:

  • If you create a model by using the Scala Notebook Editor, you can save the model as a SparkML, MLeap, or SparkML/MLeap model type. You can deploy the model as a SparkML or MLeap model.
  • If you create a model by using the Python Notebook Editor, you can save and deploy the model as a Scikit-learn or XGBoost model. While a native XGBoost model is saved and deployed as a XGBoost, a XGBoost Scikit-learn wrapper model is saved and deployed as a Scikit-learn model.
  • If you create a time series model by using the Python Notebook Editor, you can save and deploy it as an ARIMA or Seasonal ARIMA model type. Make sure that you use the repository SAVE API to save the model and specify the corresponding time series scoring engine (Time Series-Arima or Time Series-Seasonal Arima) for deployment.

You can also use the WML for z/OS PMML support to import a model that is developed on another platform. You can save and deploy the imported model as a PMML model type.

The WML for z/OS support of the model types has certain limitations:

  • Limitations in the MLeap and SparkML engine:
    • The MLeap engine does not support the following z/OS Spark transformers or estimators:
      • org.apache.spark.ml.feature.SQLTransformer
      • org.apache.spark.ml.tuning.CrossValidator
      • org.apache.spark.ml.tuning.TrainSplitValidator
      • org.apache.spark.ml.classification.MultilayerPerceptronClassifier
      • org.apache.spark.ml.feature.BucketedRandomProjectionLSHModel
      • org.apache.spark.ml.feature.MinHashLSHModel
      • org.apache.spark.ml.feature.OneHotEncoderModel
      • org.apache.spark.ml.feature.OneHotEncoder (for Spark 2.3.0 or later)

      If a pipeline contains any of these transformers or estimators, the pipeline will not be retained in the MLeap bundle, and the model will be saved as a SparkML type only.

    • The MLeap engine does not support any customized z/OS Spark transformers or estimators except the CADS (Cognitive Assistant for Data Scientists) estimator.
    • The MLeap engine does not support any model that has array, vector, map, or struct as a column data type. The model will be saved as a SparkML type only.
    • You cannot create a PMML model in WML for z/OS, but you can import and deploy a PMML model that you've already created elsewhere.
  • Limitations in the Scikit-learn engine:
    • The Scikit-learn engine does not support the feedback evaluation, retraining, and batch scoring of a model.
    • The Scikit-learn engine does not support a model if it does not contain a predict method, such as SpectralBiclustering and AgglomerativeClustering.
  • Limitations in the XGBoost engine:
    • The XGBoost engine does not support feedback evaluation and batch scoring of a model.
    • Using a XGBoost model with objective count:poisson for prediction may cause errors.
    • WML for z/OS supports the conversion of a XGBoost model to PMML only if the XGBoost model has the following parameters:
      • booster
        • gbtree
      • objective
        • binary:logistic
        • count:poisson
        • multi:softmax
        • multi:softprob
        • reg:gamma
        • reg:logistic
        • reg:linear
        • reg:tweedie
    • You can use the XGBoost Scikit-learn wrapper API within a Scikit-learn pipeline to train a model. If you want to convert the pipeline to PMML, all the operations in the pipeline must be supported by JPMML-SkLearn. See JPMML-SkLearn for details.
  • Limitations in the ONNX engine:
    • You cannot create an ONNX model in WML for z/OS.
    • The WML for z/OS UI does not support direct import of ONNX models, but you can download and run the supplied model conversion utility on a distributed platform (Docker image and scripts as described in Downloading WML for z/OS IDE installer), such as Linux, and transfer existing ONNX models to the WMLz model repository on z/OS.
    • The ONNX engine does not support feedback evaluation, retraining, and batch scoring of a ONNX model.
    • The ONNX engine supports version 8 of ONNX operator sets with exception of the parametric set. Specifically, it only supports the following operators:

      Abs
      Acos
      Acosh
      Add
      And
      Asin
      Asinh
      Atan
      Atanh
      AveragePool
      BatchNormalization
      Cast
      Ceil
      Concat
      ConstantFill
      Conv
      Conv
      Cos
      Cosh

      Div
      Dropout
      Elu
      Erf
      Exp
      Flatten
      Floor
      GRU
      Gemm
      GlobalAveragePool
      HardSigmoid
      Identity
      LRN
      LSTM
      LeakyRelu
      Less
      Log
      LogSoftmax
      MatMul

      Max
      MaxPool
      Min
      Mul
      Neg
      PRelu
      Pow
      RNN
      Reciprocal
      ReduceSum
      Relu
      Reshape
      Scan
      Selu
      Sigmoid
      Sign
      Sin
      Sinh
      Slice

      Softmax
      Softplus
      Softsign
      Split
      Sqrt
      Squeeze
      Sub
      Sum
      Tan
      Tanh
      Tile
      Transpose
      Unsqueeze

    See ONNX operators for more information about these operators.