Table of contents

Extension model nugget

The Extension model nugget is generated and placed on your flow canvas after running the Extension Model node, which contains your R script or Python for Spark script that defines the model building and model scoring.

By default, the Extension model nugget contains the script that's used for model scoring, options for reading the data, and any output from the R console or Python for Spark. Optionally, the Extension model nugget can also contain various other forms of model output, such as graphs and text output. After the Extension model nugget is generated and added to your flow canvas, an output node can be connected to it. The output node is then used in the usual way within your flow to obtain information about the data and models, and for exporting data in various formats.

Syntax tab

R model scoring syntax. If using R, the R script that's used for model scoring is displayed in this field. By default, this field is enabled but not editable. To edit the Python model scoring script, click Edit.

Python model scoring syntax. If using Python for Spark, the Python script that's used for model scoring is displayed in this field. By default, this field is enabled but not editable. To edit the Python model scoring script, click Edit.

If you click Edit to make the scoring syntax field editable, you can then edit your model scoring script by typing in the scoring syntax field. For example, you might want to edit your model scoring script if you identify an error in your model scoring script after you have run the Extension Model node to generate an Extension model nugget. Any changes you make to the model scoring script in the Extension model nugget will be lost if you regenerate the model by running the Extension Model node again.

Model Options tab

Read Data Options. These options only apply to R, not Python for Spark. With these options, you can specify how missing values, flag fields, and variables with date or datetime formats are handled.

  • Read data in batches. If you're processing a large amount of data (that's too big to fit into the R engine's memory, for example), use this option to break the data down into batches that can be sent and processed individually. Specify the maximum number of data records to include in each batch.

    For both the Extension Transform node and the Extension model nugget, data passes through the R script (in batch). For this reason, scripts for model scoring and process nodes that run in either a Hadoop or database environment shouldn't include operations that span or combine rows in the data, such as sorting or aggregation. This limitation is imposed to ensure that data can be split up in a Hadoop environment, and during in-database mining. Extension Output and Extension Model nodes don't have this limitation.

  • Convert flag fields. Specifies how flag fields are treated. There are two options: Strings to factor, Integers and Reals to double, and Logical values (True, False). If you select Logical values (True, False) the original values of the flag fields are lost. For example, if a field has values Male and Female, these are changed to True and False.
  • Convert missing values to the R 'not available' value (NA). When selected, any missing values are converted to the R NA value. The value NA is used by R to identify missing values. Some R functions that you use might have an argument that can control how the function behaves when the data contains NA. For example, the function might allow you to choose to automatically exclude records that contain NA. If this option isn't selected, any missing values are passed to R unchanged, and might cause errors when your R script runs.
  • Convert date/time fields to R classes with special control for time zones When selected, variables with date or datetime formats are converted to R date/time objects. You must select one of the following options:
    • R POSIXct. Variables with date or datetime formats are converted to R POSIXct objects.
    • R POSIXlt (list). Variables with date or datetime formats are converted to R POSIXlt objects.
    Note: The POSIX formats are advanced options. Use these options only if your R script specifies that datetime fields are treated in ways that require these formats. The POSIX formats don't apply to variables with time formats.
The options you select for the Convert flag fields, Convert missing values to the R 'not available' value (NA), and Convert date/time fields to R classes with special control for time zones controls aren't recognized when the Extension model nugget runs against a database. When the node runs against a database, the default values for these controls are used instead:
  • Convert flag fields is set to Strings to factor, Integers and Reals to double
  • Convert missing values to the R 'not available' value (NA) is selected
  • Convert date/time fields to R classes with special control for time zones is not selected

Console Output tab

The Console Output tab contains any output that's received when the R script or Python for Spark script on the Syntax tab runs (for example, if using an R script, it shows output received from the R console when the R script in the R model scoring syntax field on the Syntax tab of the Extension model nugget runs). This output includes any R or Python error messages or warnings that are produced when the R or Python script runs, and any text output from the R console. The output can be used, primarily, to debug the script.

Every time the model scoring script runs, the content of the Console Output tab is overwritten with the output received from the R console or Python for Spark. You can't edit the console output.