SPSS® Modeler offers nodes for using
Spark native algorithms. The Spark tab on the Nodes
Palette
contains the following nodes you can use to run Spark
algorithms. These nodes are supported on Windows 64, Mac 64, and Linux 64. Note that these nodes
don't support specifying an integer/double column as Flag/Nominal for building a model. To do this,
you must convert the column value to 0/1 or 0,1,2,3,4...
|
XGBoost© is an advanced implementation of a gradient boosting algorithm.
Boosting algorithms iteratively learn weak classifiers and then add them to a final strong
classifier. XGBoost is very flexible and provides many parameters that can be overwhelming to most
users, so the XGBoost-AS node in SPSS Modeler exposes the core features and
commonly used parameters. The XGBoost-AS node is implemented in Spark. |
|
K-Means is one of the most commonly used clustering algorithms. It clusters
data points into a predefined number of clusters. The K-Means-AS node in SPSS Modeler is implemented in Spark. For
details about K-Means algorithms, see https://spark.apache.org/docs/2.2.0/ml-clustering.html. Note
that the K-Means-AS node performs one-hot encoding automatically for categorical variables. |