Determining if data redistribution is needed

Determining the current data distribution for a database partition group or table can be helpful in determining if data redistribution is required. Details about the current data distribution can also be used to create a custom distribution map that specifies how to distribute data.

About this task

If a new database partition is added to a database partition group, or an existing database partition is dropped from a database partition group, perform data redistribution to balance data among all the database partitions.

If no database partitions have been added or dropped from a database partition group, then data redistribution is usually only indicated when there is an unequal distribution of data among the database partitions of the database partition group. Note that in some cases an unequal distribution of data can be desirable. For example, if some database partitions reside on a powerful machine, then it might be beneficial for those database partitions to contain larger volumes of data than other partitions.

Procedure

To determine if data redistribution is needed:

  1. Get information about the current distribution of data among database partitions in the database partition group.
    Run the following query on the largest table (alternatively, a representative table) in the database partition group:
    SELECT DBPARTITIONNUM(column_name), COUNT(*) FROM table_name
          GROUP BY DBPARTITIONNUM(column_name)
          ORDER BY DBPARTITIONNUM(column_name) DESC
    Here, column_name is the name of the distribution key for table table_name.

    The output of this query shows how many records from table_name reside on each database partition. If the distribution of data among database partitions is not as desired, then proceed to the next step.

  2. Get information about the distribution of data across hash partitions.
    Run the following query with the same column_name and table_name that were used in the previous step:
    SELECT HASHEDVALUE(column_name), COUNT(*) FROM table_name
          GROUP BY HASHEDVALUE(column_name)
          ORDER BY HASHEDVALUE(column_name) DESC

    The output of this query can easily be used to construct the distribution file needed when the USING DISTFILE parameter in the REDISTRIBUTE DATABASE PARTITION GROUP command is specified. Refer to the REDISTRIBUTE DATABASE PARTITION GROUP command reference for a description of the format of the distribution file.

  3. Optional: If the data requires redistribution, you can plan to do this operation during a system maintenance opportunity.

    When the USING DISTFILE parameter is specified, the REDISTRIBUTE DATABASE PARTITION GROUP command uses the information in the file to generate a new partition map for the database partition group. This operation results in a uniform distribution of data among database partitions.

    If a uniform distribution is not desired, you can construct your own target partition map for the redistribution operation. The target partition map can be specified by using the USING TARGETMAP parameter in the REDISTRIBUTE DATABASE PARTITION GROUP command.

Results

After doing this investigation, you will know if your data is uniformly distributed or not or if data redistribution is required.