Python for Spark
IBM® SPSS® Modeler supports Python scripts for
Apache Spark.
Note:
- Python nodes depend on the Spark environment.
- Python scripts must use the Spark API because data will be presented in the form of a Spark DataFrame.
- Old nodes created in version 17.1 will still only run against IBM SPSS Analytic Server (the data originates from an IBM SPSS Analytic Server source node and has not been extracted to IBM SPSS Modeler Server). New Python and Custom Dialog Builder nodes created in version 18.0 or later can run against IBM SPSS Modeler Server.
- When installing Python, make sure all users have permission to access the Python installation.
- If you want to use the Machine Learning Library (MLlib), you must install a version of Python that includes NumPy. Then you must configure the IBM SPSS Modeler Server (or the local server in IBM SPSS Modeler Client) to use your Python installation. For details, see Scripting with Python for Spark.