One of the techniques to improve database performance and achieve greater scalability is using table partitioning. According to live industry observations when you see the following symptoms you should consider table partitioning
Slow queries that return small amounts or no data
Slow queries that return large amounts of data
Blocking between readers and writers (inserts or updates)
As tables grow in size, it may be easier to manage your data in chunks or by limited ranges. Here are 7 things to remember while table partitioning. I will be using DB2 v 10.5.0.0 PureScale to elucidate these points.
Not all tables are meant to be partitioned. Choosing the right table is not a simple exercise. Several inputs must be considered before taking a decision.
- Only tables with higher reads should be partitioned as writing is a far more expensive process than reading. Assess the rows read by MON_GET_WORKLOAD table function I recommend shortlisting tables where the rows_read Vs rows_inserted/updated ratio is above 70:30. For eg: SELECT ROWS_READ, ROWS_RETURNED,ROWS_INSERTEDFROMTABLE(MON_GET_WORKLOAD('',-2));
- Another important consideration is the size of the table or to be precise relative size of the table. In any application, some tables are significantly larger than others. There is less benefit in partitioning smaller tables. Shortlist the top 20 percent of tables according table size.
- Different members in a DB2® pureScale® instance might require access to a page of data that another member is already using. The process whereby one member requests and is granted a page being used by another member is known as page reclaiming. RECLAIM WAIT TIME is the amount of time spent waiting on page locks, where the lock request caused a page to be reclaimed.
Pick tables with a high reclaim wait time basically a relative one. Partitioning such tables would distribute rows in different pages, thus might help in reducing reclaim wait. This in turn improves the performance of the Purescale database. This can be monitored using MON_GET_PAGE_ACCESS_INFO table function
SELECT SUBSTR(TABNAME,1,8) AS NAME,
SUBSTR(OBJTYPE,1,5) AS TYPE,
RECLAIM_WAIT_TIME AS RECLAIM_WAIT_TIME
FROM TABLE( MON_GET_PAGE_ACCESS_INFO('', NULL, NULL)
ORDER BY RECLAIM_WAIT_TIME;
- Another important question to answer is how many tables need to be partitioned. All tables shortlisted based on the above 3 criteria can partitioned. Some experimentation would be required to determine the ideal number of tables to be partitioned.
- Once the table is identified, the next step is to identify the column of the table i.e. partition key column(s). The most common partition key column is the unique identifier (ID) or the primary key. Other possibilities include the date, object names. The final decision is often dependent on the partitioning strategy.
- There are several partitioning strategies used. A commonly used partitioning scheme is the date, where you may decide to clump together data in data partitions such as by year or month. You could also have numeric attributes for partitioning, for instance, records with IDs from one up to 1 million are be stored in one data partition, IDs from 1 million to 2 million in another data partition, and so on. Or for example, you could have records for customers with names starting with A-C in one data partition, D-M in the second data partition, N-Q in a third data partition, and R-Z in the last data partition.
Note: Although you have the option of referring to data partitions by names or numbers (useful for data partition operations), they can be completely transparent to applications. That is, applications can continue to access data by specifying column and table names, and do not need to worry about which data partition(s) the data resides in.
- Based on the size of the table, number of rows read and partitioning strategy, the number of partitions can be arrived at. Should there be 2 partitions or 10? What happens if the number of partitions increase? If the read write ratio is greater than 7:1, then one can consider multiple partitions. Anything less, the ideal number of partitions should be two. It is important to calculate the read write ratio of the partitions once completed. Anything less, the ideal number of partitions should be two. We can run experiments to identify the ideal number of partitions suitable for the db2 performance.
Create Range partition table syntax :
CREATE TABLE lineitem (
l_orderkey DECIMAL(10,0) NOT NULL,
l_year_month INT GENERATED ALWAYS AS (YEAR(l_shipdate)*100 + MONTH(l_shipdate)))
PARTITION BY RANGE(l_shipdate)
(STARTING ('1/1/1992') ENDING ('12/31/1992') EVERY 1 MONTH);
One of the Business partner for whom we executed a successful performance benchmarking exercise achieving 60 million transaction per hour. They have recommended partitioning and this is what they said:
Table Partitioning is the best way to improve database performance where each
and every query is setup to take advantage of the partitioning column. Partitioning didn't change the behavior of isolation levels.
-Senior Project Lead Business Partner
To learn more from a live case study where we used PureData for Transactions machine (PDTX) with DB2 pureScale at its heart, refer here
<link to the white paper>. There are several other optimization techniques discussed in detail.
mailto :- ramyeles@in[dot]ibm[dot]com