Database partitioning is a technique for horizontally distributing rows in the database across many database instances that work together to form a single large database server. These instances can be located within a single server, across several physical machines, or a combination. In DB2 products, this is called the Database Partitioning Facility (DPF).
Database partitioning allows the DB2 database manager to scale to hundreds of instances that participate in the larger database system. The scalability of this design can approach near linear scaleout for many complex query workloads. As such, database partitioning has become extremely popular for data warehousing and BI workloads due to its near linear scaleout characteristics and its ability to scale to hundreds of terabytes of data and hundreds of CPUs. The architecture is less popular for OLTP processing due to the inter-instance communication incurred on each transaction, which though small, can still be very significant for short running transactions typically found in OLTP workloads. DPF might be used for OLTP applications that require a cluster of computers for throughput.
Shared-nothing hash partitioning hashes rows to logical data partitions. The primary design goal of hash distribution is to ensure the even distribution of data across all logical nodes (as range partitioning tends to skew data). These partitions might reside within a single server or be distributed across a set of physical machines, as shown in Figure 9:

Figure 9. Table hash-partitioning
The scalability of shared-nothing databases has proven to be nearly linear for a wide range of complex query workloads. Also, the modular nature of the design lends itself to linear scaleout as storage pressures, workload pressures, or both grow. As a result, shared-nothing architectures have dominated data warehousing for the past decade. Database partitioning is implemented without impact on existing application code, and is completely transparent. Partitioning strategies can be modified online with the redistribution utility without affecting application code.
The primary design choice is determining which columns to use to hash partition each table that comprises the database-partitioning key. The goals are twofold:
- Distribute data evenly across database partitions. This requires choosing partitioning columns that have a high cardinality of values to ensure an even distribution of rows across the logical partitions.
- Minimize shipping of data across database partitions during join processing. Collocation of rows being joined will occur (avoiding movement) if the partitioning key is included in the WHERE clause.
Another central problem in designing shared-nothing data warehouses is determining the best combinations of memory, CPUs, buses, storage capacity, storage bandwidth, and networks. How much or how many do you need of each of these?
To help solve this problem, IBM provides the IBM Balanced Warehouseâ„¢, which is based on DB2 database system's shared nothing architecture. It was developed through IBM best practices used for successful client implementations.