Determining the current data distribution
for a database partition group or table can be helpful in determining
if data redistribution is required. Details about the current data
distribution can also be used to create a custom distribution map
that specifies how to distribute data.
About this task
If a new database partition is added to a database partition
group, or an existing database partition is dropped from a database
partition group, perform data redistribution to balance data among
all the database partitions.
If no database partitions have
been added or dropped from a database partition group, then data redistribution
is usually only indicated when there is an unequal distribution of
data among the database partitions of the database partition group.
Note that in some cases an unequal distribution of data can be desirable.
For example, if some database partitions reside on a powerful machine,
then it might be beneficial for those database partitions to contain
larger volumes of data than other partitions.
Procedure
To determine if data redistribution is needed:
- Get information about the current distribution of data
among database partitions in the database partition group.
Run
the following query on the largest table (alternatively, a representative
table) in the database partition group:
SELECT DBPARTITIONNUM(column_name), COUNT(*) FROM table_name
GROUP BY DBPARTITIONNUM(column_name)
ORDER BY DBPARTITIONNUM(column_name) DESC
Here,
column_name is
the name of the distribution key for table
table_name.
The output of this query shows how
many records from table_name reside on each database
partition. If the distribution of data among database partitions is
not as desired, then proceed to the next step.
- Get information about the distribution of data across hash
partitions.
Run the following query with the same
column_name and
table_name that
were used in the previous step:
SELECT HASHEDVALUE(column_name), COUNT(*) FROM table_name
GROUP BY HASHEDVALUE(column_name)
ORDER BY HASHEDVALUE(column_name) DESC
The output of this query can easily be used to construct
the distribution file needed when the USING DISTFILE parameter
in the REDISTRIBUTE DATABASE PARTITION GROUP command
is specified. Refer to the REDISTRIBUTE DATABASE PARTITION
GROUP command reference for a description of the format
of the distribution file.
- Optional: If the data requires redistribution,
you can plan to do this operation during a system maintenance opportunity.
When the USING DISTFILE parameter is
specified, the REDISTRIBUTE DATABASE PARTITION GROUP command
uses the information in the file to generate a new partition map for
the database partition group. This operation results in a uniform
distribution of data among database partitions.
If a uniform
distribution is not desired, you can construct your own target partition
map for the redistribution operation. The target partition map can
be specified by using the USING TARGETMAP parameter
in the REDISTRIBUTE DATABASE PARTITION GROUP command.
Results
After doing this investigation, you will know if your data
is uniformly distributed or not or if data redistribution is required.