Extension model nugget - Model Options tab

The Model Options tab is always present in the Extension model nugget.

Read Data Options. These options only apply to R, not Python for Spark. With these options, you can specify how missing values, flag fields, and variables with date or datetime formats are handled.

  • Read data in batches. If you are processing a large amount of data (that is too big to fit into the R engine's memory, for example), use this option to break the data down into batches that can be sent and processed individually. Specify the maximum number of data records to be included in each batch.

    For both the Extension Transform node and the Extension Scoring nugget, data passes through the R script (in batch). For this reason, scripts for model scoring and process nodes that are run in either a Hadoop or Database environment should not include operations that span or combine rows in the data, such as sorting or aggregation. This limitation is imposed to ensure that data can be split up in a Hadoop environment, and during in-database mining. This limitation does not apply if the scripts for model scoring are run in SPSS® Modeler Server. Extension Output and Extension Model nodes do not have this limitation.

  • Convert flag fields. Specifies how flag fields are treated. There are two options: Strings to factor, Integers and Reals to double, and Logical values (True, False). If you select Logical values (True, False) the original values of the flag fields are lost. For example, if a field has values Male and Female, these are changed to True and False.
  • Convert missing values to the R 'not available' value (NA). When selected, any missing values are converted to the R NA value. The value NA is used by R to identify missing values. Some R functions that you use might have an argument that can be used to control how the function behaves when the data contain NA. For example, the function might allow you to choose to automatically exclude records that contain NA. If this option is not selected, any missing values are passed to R unchanged, and might cause errors when your R script is executed.
  • Convert date/time fields to R classes with special control for time zones When selected, variables with date or datetime formats are converted to R date/time objects. You must select one of the following options:
    • R POSIXct. Variables with date or datetime formats are converted to R POSIXct objects.
    • R POSIXlt (list). Variables with date or datetime formats are converted to R POSIXlt objects.
    Note: The POSIX formats are advanced options. Use these options only if your R script specifies that datetime fields are treated in ways that require these formats. The POSIX formats do not apply to variables with time formats.
The options that are selected for the Convert flag fields, Convert missing values to the R 'not available' value (NA), and Convert date/time fields to R classes with special control for time zones controls are not recognized when the Extension model nugget is run against a database. When the node is run against a database, the default values for these controls are used instead:
  • Convert flag fields is set to Strings to factor, Integers and Reals to double.
  • Convert missing values to the R 'not available' value (NA) is selected.
  • Convert date/time fields to R classes with special control for time zones is not selected.