Python nodes

SPSS® Modeler offers nodes for using Python native algorithms. The Python tab on the Nodes Palette contains the following nodes you can use to run Python algorithms. These nodes are supported on Windows 64, Linux64, and Mac.

The Synthetic Minority Over-sampling Technique (SMOTE) node provides an over-sampling algorithm to deal with imbalanced data sets. It provides an advanced method for balancing data. The SMOTE process node in SPSS Modeler is implemented in Python and requires the imbalanced-learn© Python library.

XGBoost Linear© is an advanced implementation of a gradient boosting algorithm with a linear model as the base model. Boosting algorithms iteratively learn weak classifiers and then add them to a final strong classifier. The XGBoost Linear node in SPSS Modeler is implemented in Python.

XGBoost Tree© is an advanced implementation of a gradient boosting algorithm with a tree model as the base model. Boosting algorithms iteratively learn weak classifiers and then add them to a final strong classifier. XGBoost Tree is very flexible and provides many parameters that can be overwhelming to most users, so the XGBoost Tree node in SPSS Modeler exposes the core features and commonly used parameters. The node is implemented in Python.

t-Distributed Stochastic Neighbor Embedding (t-SNE) is a tool for visualizing high-dimensional data. It converts affinities of data points to probabilities. This t-SNE node in SPSS Modeler is implemented in Python and requires the scikit-learn© Python library.

A Gaussian Mixture© model is a probabilistic model that assumes all the data points are generated from a mixture of a finite number of Gaussian distributions with unknown parameters. One can think of mixture models as generalizing k-means clustering to incorporate information about the covariance structure of the data as well as the centers of the latent Gaussians. The Gaussian Mixture node in SPSS Modeler exposes the core features and commonly used parameters of the Gaussian Mixture library. The node is implemented in Python.

Kernel Density Estimation (KDE)© uses the Ball Tree or KD Tree algorithms for efficient queries, and combines concepts from unsupervised learning, feature engineering, and data modeling. Neighbor-based approaches such as KDE are some of the most popular and useful density estimation techniques. The KDE Modeling and KDE Simulation nodes in SPSS Modeler expose the core features and commonly used parameters of the KDE library. The nodes are implemented in Python.

The Random Forest node uses an advanced implementation of a bagging algorithm with a tree model as the base model. This Random Forest modeling node in SPSS Modeler is implemented in Python and requires the scikit-learn© Python library.

Hierarchical Density-Based Spatial Clustering (HDBSCAN)© uses unsupervised learning to find clusters, or dense regions, of a data set. The HDBSCAN node in SPSS Modeler exposes the core features and commonly used parameters of the HDBSCAN library. The node is implemented in Python, and you can use it to cluster your dataset into distinct groups when you don't know what those groups are at first.

The One-Class SVM node uses an unsupervised learning algorithm. The node can be used for novelty detection. It will detect the soft boundary of a given set of samples, to then classify new points as belonging to that set or not. This One-Class SVM modeling node in SPSS Modeler is implemented in Python and requires the scikit-learn© Python library.