Importing and exporting data using Python for Spark

Using the Custom Dialog Builder for Extensions, you can create custom nodes and write Python for Spark scripts to read data from wherever your data source is, and write data out to any data format supported by Apache Spark.

For example, a user wants to write his data to a database. He uses the Custom Dialog Builder for Extensions and Python for Spark to create a custom export JDBC node and then runs the model to write data into a database. To read data from the database, he can also create a custom import JDBC node. He could also use this same method to read data into SPSS® Modeler from a JSON file, for example. Then, after reading his data into SPSS Modeler, he can use all available SPSS Modeler nodes to work on his business problem.

Note: If you want to use JDBC with Python for Spark import and export functionality, you must copy your JDBC driver file to the as/lib directory inside your IBM® SPSS Modeler installation directory.

To import/export data using Python for Spark

Go to Extensions > Custom Node Dialog Builder.
Under Dialog Properties, select Python for Spark for the Script Type and select Import or Export for the Node Type.
Enter other properties as desired, such as a Dialog Name.
In the Script section, type or paste your Python for Spark script for importing or exporting data.
Click Install to install the Python for Spark script. New custom import nodes will be added to the Sources palette, and new custom export nodes will be added to the Export palette.