Partition Node Options

Partition field. Specifies the name of the field created by the node.

Partitions. You can partition the data into two samples (train and test) or three (train, test, and validation).

Partition size. Specifies the relative size of each partition. If the sum of the partition sizes is less than 100%, then the records not included in a partition will be discarded. For example, if a user has 10 million records and has specified partition sizes of 5% training and 10% testing, after running the node, there should be roughly 500,000 training and one million testing records, with the remainder having been discarded.

Values. Specifies the values used to represent each partition sample in the data.

Seed. Only available when Repeatable partition assignment is selected. When sampling or partitioning records based on a random percentage, this option allows you to duplicate the same results in another session. By specifying the starting value used by the random number generator, you can ensure the same records are assigned each time the node is executed. Enter the desired seed value, or click the Generate button to automatically generate a random value. If this option is not selected, a different sample will be generated each time the node is executed.

Note: When using the Seed option with records read from a database, a Sort node may be required prior to sampling in order to ensure the same result each time the node is executed. This is because the random seed depends on the order of records, which is not guaranteed to stay the same in a relational database. See the topic Sort Node for more information.

Use unique field to assign partitions. Only available when Repeatable partition assignment is selected. (For Tier 1 databases only) Check this box to use SQL pushback to assign records to partitions. From the drop-down list, choose a field with unique values (such as an ID field) to ensure that records are assigned in a random but repeatable way.

Database tiers are explained in the description of the Database source node. See the topic Database Source Node for more information.

Generating select nodes

Using the Generate menu in the Partition node, you can automatically generate a Select node for each partition. For example, you could select all records in the training partition to obtain further evaluation or analyses using only this partition.