An effectively partitioned database optimizes query performance and data handling. Optimal query and data handling facilitate database scalability and can allow for massive amounts of parallel processing.
In the Design Studio, you can specify database partitioning information at the data flow level, at the data flow operator level, or at both levels. If you specify partitioning information at both levels, the information that you specify for any operator in the data flow overrides the information that you specify for the data flow. Partitioning information at the operator level includes distribution keys that further fine tune and improve performance. You select the key columns from the input port schema for the operator.
Specifying partitioning information for a data flow generates partition-aware SQL code for that data flow at run time. You can specify partitioning information for tables that are created by the partition-aware operators. These tables include those that data station operators use and the internal staging tables that the data station, distinct, key lookup, splitter, and table source operators generate. You can also specify partition-related information for bulk load target and file import operators.
Because the internal staging tables are generated only when required, the partitioning information that you specify is used only when an internal staging table is generated. To verify whether the code generator used your partitioning information, you can open the Code Generation page and look for the table name.
You can specify partitioning information for data flows and data flow operators by: