Partitioning tab
On the Partitioning tab you can specify details about how the incoming data is partitioned or collected before it is written to the Informix® database. You can also specify that the data should be sorted before it is written to the Informix database.
By default, Informix enterprise stage partitions data in Auto mode. Auto mode is the best partitioning method based on the execution modes specified for the current and preceding stages and the number of nodes specified in the configuration file.
If the stage is operating in sequential mode, it first collects data and then writes the data to the file by using the default Auto collection method.
You can override this default behavior by using the Partitioning tab. The options that you set on this tab behave differently according to whether the job in the preceding or the current stage is set to run in parallel or sequential mode.
If the stage is set to run in parallel, then you can set a partitioning method by selecting from the Partition type drop-down list. The partition type that you select will override any current partitioning.
If the stage is set to run in sequential mode, but the preceding stage is run in parallel, then you can set a collection method from the Collector type drop-down list. The collection method that you select will override the default collection method.
The following partitioning methods are available:
- (Auto). By default, Informix enterprise stage partitions data in Auto mode. Auto mode provides the best partitioning method, depending on the execution modes of current and preceding stages and how many nodes are specified in the configuration file.
- Entire. Each file that is written to receives the entire data set.
- Hash. The records are divided into partitions based on the value of key columns that are selected from the Available list.
- Modulus. The records are divided into partitions using a modulus function on the key column you select from the Available list. This method is commonly used to create partitions on tag columns.
- Random. The records are divided into partitions randomly, based on the output of a random number generator.
- Round Robin. The records are partitioned on a round robin basis as they enter the stage. The Round Robin partitioning method reads a record from the first input partition, then from the second partition, and so on. After reaching the last partition, the read operation starts with the first record again
- Same. Preserves the current partitions.
- Range. The records in a data set are divided into approximately equal- size partitions based on one or more partitioning keys. Range partitioning is often a preprocessing step that is performed before a total sort on a data set. Additional properties must be set. Access these properties by clicking the properties button.
The following collection methods are available:
- Auto. The default data collection method for Informix enterprise stage is Auto. You can use Auto mode if you want to read any row from any input partition as the row becomes available
- Ordered. Reads all records from the first partition, then all records from the second partition, and so on.
- Round Robin. Reads a record from the first input partition, then from the second partition, and so on. After reaching the last partition, the read operation starts with the first record again.
- Sort Merge. Reads records in an order that is based on one or more columns of the record. This method requires you to select a collecting key column from the Available list.
You can specify that data that arrives on the input link should be sorted before it is written to the database. The sort is always performed within data partitions. If the stage divides incoming data into partitions, the sort operation occurs after the data is partitioned. If the stage is collecting data, the sort operation occurs before the data is collected. The availability of sorting depends on the selected partitioning or collecting method. The following options are available:
- Perform Sort. Sorts the data that is arriving on the link. Select the columns to sort on from the Available list.
- Stable. Preserves previously sorted data sets. This option is default.
- Unique. Retains only one record if multiple records have identical sorting key values. If the stable sort option is also set, the first record is retained.
If NLS is enabled, you can select a locale that specifies a collate convention for the sort operation. Click the NLS button to select the locale.
You can also specify sort direction, case sensitivity, whether to sort data as ASCII or EBCDIC, and whether null columns appear first or last for each column. If you are using a keyed partitioning method, you can also specify whether the column is used as a key for sorting, for partitioning, or for both. Select the column in the Selected list and right-click to open the shortcut menu.