Extension Import node - Syntax tab
Select your type of syntax – R or Python for Spark. Then enter or paste your custom script for importing data. When your syntax is ready, you can click Run to execute the Extension Import node.
R example
# import R demo data cars to modeler
modelerData <- cars
# write the data model that matches the data
var1<-c(fieldName="speed",fieldLabel="",fieldStorage="integer",fieldMeasure="",fieldFormat="", fieldRole="")
var2<-c(fieldName="dist",fieldLabel="",fieldStorage="integer",fieldMeasure="",fieldFormat="", fieldRole="")
modelerDataModel<-data.frame(var1, var2)
Python for Spark example
import spss.pyspark.runtime
from pyspark.sql import SQLContext
from pyspark.sql.types import *
cxt = spss.pyspark.runtime.getContext()
if cxt.isComputeDataModelOnly():
_schema = StructType([StructField("Age", LongType(), nullable=True), \
StructField("Sex", StringType(), nullable=True), \
StructField("BP", StringType(), nullable=True), \
StructField("Cholesterol", StringType(), nullable=True), \
StructField("Na", DoubleType(), nullable=True), \
StructField("K", DoubleType(), nullable=True), \
StructField("Drug", StringType(), nullable=True)])
cxt.setSparkOutputSchema(_schema)
else:
sqlContext = cxt.getSparkSQLContext()
# the demo data is in modeler installation path
df = sqlContext.read.option("inferSchema", "true").option("header", "true").csv("/opt/IBM/SPSS/ModelerServer/Cloud/demos/DRUG1n")
cxt.setSparkOutputData(df)
df.show()
# print (df.dtypes[:])