Setting properties for flows
You can specify properties to apply to the current flow.
To set flow properties, click the Flow Properties icon .
You can configure the following properties.
Options
- General
-
- Maximum number of rows to show in Data Preview
- When you preview the data for a node, you can specify the number of rows to show.
- Limit members for nominal fields
- The data type of the nominal (set) fields becomes Typeless when the number of members exceeds the maximum number of members that you set in Maximum members. This option is useful when you are working with large nominal fields. When the measurement level of a field is set to Typeless, its role is automatically set to None. Fields that are set to None aren't available for modeling.
- Date/Time
-
- Import date/time/timestamp as
- Select whether to use a date and time format for storing data in date and time fields or whether to import them as string variables.
- Use microseconds in timestamp fields
- If you have timestamp data that is measured in microseconds, you can enable this option to use
the more precise data in your flows. To enable the option, select this checkbox and
String for the Import date/time/timestamp as
setting.Important: This option works only for connectors that support SQL pushback. You also need to manually save each Data Asset node that imports data from one of these connectors. For information about these limitations, see Known issues and limitations for SPSS Modeler
- Date format
- Select a date format to use for date storage fields or when strings are interpreted as dates by CLEM date functions.
- Time format
- Select a time format to use for time storage fields or when strings are interpreted as times by CLEM time functions.
- Rollover days/mins
- For time formats, select whether negative time differences are interpreted as referring to the previous day or hour.
- Date baseline (1st Jan)
- Select the baseline years (always 1 January) to be used by CLEM date functions that work with a single date.
- 2-digit dates start from
- Specify the cutoff year to add century digits for years that are denoted with only 2 digits. For example, specifying 1930 as the cutoff year assumes that 05/11/02 is in the year 2002. The same setting will use the 20th century for dates after 30; thus 05/11/73 is assumed to be in 1973.
- Time zone
- Select how the time zone is chosen for use with the
datetime_now
CLEM expression.- If you select Server, the time
zone is used from where the SPSS Modeler run-time is running
(sometimes this time is the same as the Client option). Or if your flow uses
data from a database and the supported database uses SQL pushback, the
datetime_now
expression uses the time of the database. - If the current flow uses an Analytic Server data source, the
datetime_now
expression uses the time from the Analytic Server; by default, the server uses Coordinated Universal Time time. - If you select Client, the time zone is used from the machine where SPSS Modeler is installed.
- Alternatively, you can select any of the Coordinated Universal Time values for the time zone.
- If you select Server, the time
zone is used from where the SPSS Modeler run-time is running
(sometimes this time is the same as the Client option). Or if your flow uses
data from a database and the supported database uses SQL pushback, the
- Number Formats
- You can specify the number of decimal places to use when SPSS Modeler displays real numbers in standard, scientific, or currency display formats.
- Optimization
- You can use these settings to optimize flow performance.
- Enable flow rewriting
- Flow rewriting reorders the nodes in a flow behind the scenes for more efficient operation, without altering flow semantics.
- Optimize CLEM expressions
- This option enables the optimizer to search for CLEM expressions that can be preprocessed before
the flow runs to increase the processing speed. For example, if you have an expression such as
log(salary)
, the optimizer calculates the actual salary value and passes that on for processing. This option can be used to improve both SQL pushback and SPSS Modeler performance. - Optimize syntax execution
- This method of flow rewriting increases the efficiency of operations that have more than one node that contains SPSS Statistics syntax. Optimization is achieved by combining the syntax commands into a single operation, instead of running each as a separate operation.
- Optimize other execution
- This method of flow rewriting increases the efficiency of operations that can't be delegated to the database. Optimization is achieved by reducing the amount of data in the flow as early as possible. The flow is rewritten to push operations closer to the data source while maintaining data integrity. This change reduces data downstream for costly operations, such as joins.
- Enable parallel processing
- When running on a computer with multiple processors, this option allows the system to balance the load across those processors, which can result in faster performance. Use of multiple nodes or use of the following individual nodes can benefit from parallel processing: C5.0, Merge (by key), Sort, Bin (rank and tile methods), and Aggregate (using one or more key fields).
- Generate SQL
-
- This option pushes SQL processing back to the database. Turning this option on or off affects
only the new flows that you create. You cannot switch the setting for an existing flow. For more
information about using this option with flows, see SQL optimization.
- Database caching (SQL only). For flows that generate SQL to be run in the database, data can be cached mid flow to a temporary table in the database rather than to the file system. When combined with SQL optimization, this option can result in significant gains in performance. For example, the output from a flow that merges multiple tables to create a data mining view may be cached and reused as needed. With database caching enabled, simply right-click any nonterminal node to cache data at that point, and the cache is automatically created directly in the database the next time the flow runs. This allows SQL to be generated for downstream nodes, further improving performance. Alternatively, this option can be disabled if needed, such as when policies or permissions preclude data being written to the database. If database caching or SQL optimization is not enabled, the cache is written to the file system instead.
- Use relaxed conversion (SQL only). This option enables the conversion of data from either strings to numbers, or numbers to strings, if stored in a suitable format. For example, if the data is kept in the database as a string, but actually contains a meaningful number, the data can be converted for use when the pushback occurs.
- This option pushes SQL processing back to the database. Turning this option on or off affects
only the new flows that you create. You cannot switch the setting for an existing flow. For more
information about using this option with flows, see SQL optimization.
- Logging
-
- Display SQL in the messages log at run time
- Specifies whether SQL generated while running the flow is passed to the messages log.
- Display SQL generation in the message log during preparation
- During flow preview, specifies whether a preview of the SQL that would be generated is passed to the messages log.
- SQL format
- Specifies whether any SQL that's displayed in the log should contain native SQL functions or
standard ODBC functions of the form
{fn FUNC(…)}
, as generated by SPSS Modeler. The former relies on ODBC driver functionality that may not be implemented. - Reformat SQL for improved readability
- Specifies whether SQL displayed in the log should be formatted for readability.
- Show status for records
- Specifies when records should be reported as they arrive at terminal nodes. Specify a number to use for updating the status every N records.
- Analytic Server
-
- Maximum number of records to process outside of Analytic Server
- Specify the maximum number of records to be imported into SPSS Modeler from an Analytic Server data source connection.
- Notification when a node can’t be processed in Analytic Server
- This setting controls what happens when a flow that would be submitted to Analytic Server contains a node that can’t be processed by Analytic Server. Specify whether to issue a warning and continue running the flow, or throw an error and stop running.
- Split model storage settings
-
- Store split models by reference on Analytic Server when model size (MB) exceeds. Generated model nuggets are typically stored as part of the flow. Split models with many splits can produce large nuggets, and moving the nugget back and forth between the flow and the Analytic Server can impact performance. As a solution, when a split model exceeds the specified size, it gets stored on the Analytic Server, and the nugget in SPSS Modeler contains a reference to the model.
- Default folder to store models by reference on Analytic Server once execution is complete. Specify the default path where you want to store split models on Analytic Server. The path should start with a valid Analytic Server project name.
- Folder to store promoted models. Specify the default path where you want to store "promoted" models. A promoted model is not cleaned up when the SPSS Modeler session is over.
Parameters
Parameters are user-defined variables that are saved and persisted with the current flow or SuperNode. Parameters are often used in scripting to control the behavior of the script, and they can be accessed from the user interface as well.You can define parameters for use in CLEM expressions and in scripting. Parameters that are defined in the flow properties are available to all nodes in the flow. Parameters set for a SuperNode are not available outside of the SuperNode. If you save a flow, any parameters set for that flow are also saved.
For more information about parameters, see Flow and SuperNode parameters.
Click Add value and enter the following information for the new parameter:
- Name
- This name is how the parameter is referenced in expressions. For example, to create a parameter
for a minimum temperature, you could enter minvalue.
When parameters are used in CLEM expressions, they are placed within single quotation marks, for example,
'$P-minvalue'
. Do not enter the$P-
prefix. It denotes a parameter in CLEM expressions. - Label
- Lists a descriptive name for each parameter created.
- Storage
- Storage indicates how the data values are stored in the parameter. For example, if values have
leading zeros that you want to preserve (such as
008
), select String as the storage type. Otherwise, the zeros are stripped from the value. - Value
- Lists the current value for each parameter, which you can change as needed. Values for date parameters must be specified in ISO standard notation (YYYY-MM-DD).
- Measure
- Select the measurement level, which is used to describe characteristics of the parameter. You can change this value to reflect the way that you intend to use the parameter. For example, Typeless indicates that the parameter can have any value compatible with its storage.
- Prompt?
- Select this option if you want users to be prompted to enter a value for this parameter when they start the runtime. You can use this option where you might need to enter different values for the same parameter on different occasions.
Globals
In the Globals tab of the flow properties, you can view the global values set for the current flow. Global values are created using a Set Globals node to determine statistics such as mean, sum, or standard deviation for selected fields.
After a Set Globals node runs, these values become available for various uses in flow operations.
You can't edit global values in the table here in the flow properties, but you can clear all global values for a flow.
Annotations
If you need to describe a flow to others in your organization, you can attach explanatory comments to flows, nodes, and model nuggets. Others can then view these comments on-screen or even print an image of the flow that includes your comments.
Use the Annotations tab of the flow properties to add text annotations to your flow. These notes are visible only when the Annotations tab is open, except that flow annotations can also be shown as on-screen comments.