Extension Transform node - Syntax tab

Select your type of syntax – R or Python for Spark. See the following sections for more information. When your syntax is ready, you can click Run to execute the Extension Transform node.

R Syntax

R Syntax. You can enter, or paste, custom R scripting syntax for data analysis into this field.

Convert flag fields. Specifies how flag fields are treated. There are two options: Strings to factor, Integers and Reals to double, and Logical values (True, False). If you select Logical values (True, False) the original values of the flag fields are lost. For example, if a field has values Male and Female, these are changed to True and False.

Convert missing values to the R 'not available' value (NA). When selected, any missing values are converted to the R NA value. The value NA is used by R to identify missing values. Some R functions that you use might have an argument that can be used to control how the function behaves when the data contain NA. For example, the function might allow you to choose to automatically exclude records that contain NA. If this option is not selected, any missing values are passed to R unchanged, and might cause errors when your R script is executed.

Convert date/time fields to R classes with special control for time zones. When selected, variables with date or datetime formats are converted to R date/time objects. You must select one of the following options:
  • R POSIXct. Variables with date or datetime formats are converted to R POSIXct objects.
  • R POSIXlt (list). Variables with date or datetime formats are converted to R POSIXlt objects.
Note: The POSIX formats are advanced options. Use these options only if your R script specifies that datetime fields are treated in ways that require these formats. The POSIX formats do not apply to variables with time formats.

Python Syntax

Python Syntax. You can enter, or paste, custom Python scripting syntax for data analysis into this field. For more information about Python for Spark, see Python for Spark and Scripting with Python for Spark.