Table of contents

Extension Import node

With the Extension Import node, you can run R scripts or Python for Spark scripts to import data.

After adding the node to your canvas, double-click the node to open its properties.

Syntax tab

Select your type of syntax – R or Python for Spark. Then enter or paste your custom script for importing data. When your syntax is ready, you can run the node.

Console Output tab

The Console Output tab contains any output that's received when the R script or Python for Spark script runs (for example, if using an R script, it shows output received from the R console when the R script in the R Syntax field on the Syntax tab is executed). This output might include R or Python error messages or warnings that are produced when the R or Python script is executed. The output can be used, primarily, to debug the script. The Console Output tab also contains the script from the R Syntax or Python Syntax field.

Every time the Extension Import script runs, the content of the Console Output tab is overwritten with the output received from the R console or Python for Spark. You can't edit the output.

Filtering or renaming fields

You can rename or exclude fields at any point in a flow. For example, as a medical researcher, you may not be concerned about the potassium level (field-level data) of patients (record-level data); therefore, you can filter out the K (potassium) field.

  • Using a Filter node, you can rename or filter fields at any point in the flow
  • You can use a Filter node to map fields from one import node to another

Viewing and setting information about types

From the Type node, you can specify field metadata and properties that are invaluable to modeling and other work.
These properties include:
  • Specifying a usage type, such as range, set, ordered set, or flag, for each field in your data
  • Setting options for handling missing values and system nulls
  • Setting the role of a field for modeling purposes
  • Specifying values for a field and options used to automatically read values from your data
  • Specifying value labels