Distinct Optimization Settings

If the data on which you are working has only a small number of records, or has already been sorted, you can optimize the way in which it is handled to enable IBM® SPSS® Modeler to process the data more efficiently.

Note: If you either select Input dataset has a low number of distinct keys, or use SQL generation for the node, any row within the distinct key value can be returned; to control which row is returned within a distinct key you need to specify the sort order by using the Within groups, sort records by fields on the Settings tab. The optimization options do not affect the results output by the Distinct node as long as you have specified a sort order on the Settings tab.

Input dataset has a low number of distinct keys. Select this option if you have a small number of records, or a small number of unique values of the key field(s), or both. Doing so can improve performance.

Input dataset is already ordered by grouping fields and sorting fields on the Settings tab. Only select this option if your data is already sorted by all of the fields listed under Within groups, sort records by on the Settings tab, and if the ascending or descending sort order of the data is the same. Doing so can improve performance.

Disable SQL generation. Select this option to disable SQL generation for the node.