Example: Computing averages and a standard deviation from histograms in a Db2 workload management configuration
One use for histograms is for obtaining the standard deviation for activity lifetimes. The example in this topic shows how bins are used for the calculation of this statistic.
A calculation of the average lifetime for each activity is a useful piece of information. However, the average alone does not accurately describe the user experience. If the variability in activity lifetime is large, the users whom you are supporting might see queries run fast at some times (which is fine) and slow at others (which might not be acceptable). When you define a goal for activity lifetimes, not only is the average lifetime of the activities important but also the standard deviation of the activity lifetime. You need to both understand and control variability to ensure that your users actually experience the observed average.
In a Db2 workload management configuration, statistics are collected on each member. The following example shows how to obtain the average activity lifetime for a single member.
Bin 1 - 0 to 2 seconds
Bin 2 - 2 to 4 seconds
Bin 3 - 4 to 8 seconds
Bin 4 - 8 to 16 seconds
Bin 5 - 16 to 32 seconds
Bin 6 - 32 to 64 seconds
Bin 7 - 64 to 128 seconds
Bin 8 - 128 seconds to infinity
Bin 1 average lifetime = (0+2)/2 = 1
Bin 2 average lifetime = (2+4)/2 = 3
Bin 3 average lifetime = (4+8)/2 = 6
Bin 4 average lifetime = (8+16)/2 = 12
Bin 5 average lifetime = (16+32)/2 = 24
Bin 6 average lifetime = (32+64)/2 = 48
Bin 7 average lifetime = (64+128)/2 = 96
Bin 1 Bin 2 Bin 3 Bin 4 Bin 5 Bin 6 Bin 7 Bin 8
count count count count count count count count
20 30 80 10 5 3 2 0
To calculate average lifetime, bin 8 must be empty. Bin 8 only exists to let you know when you need to change the upper boundary of your range. For this reason, you must specify the upper bound for the range.
average lifetime = (20 x 1 + 30 x 3 + 80 x 6 + 10 x 12 + 5 x 24 + 3 x 48 + 2 x 96) / 150
= (20 + 90 + 480 + 120 + 120 + 144 + 192) / 150
= 1166 / 150
= 7.77 seconds
Standard deviation = [(20 x (1 - 7.77)2 + 30 x (3 - 7.77)2 + ... ) / 150]1/2
For multimember database environments, averages and standard deviations can be computed by first computing a combined histogram across all database members by adding the counts of each bin across the database members.
Database Bin 1 Bin 2 Bin 3 Bin 4 Bin 5 Bin 6 Bin 7 Bin 8
member count count count count count count count count
1 20 30 80 10 5 3 2 0
2 1 5 20 20 4 0 0 0
Bin 1 Bin 2 Bin 3 Bin 4 Bin 5 Bin 6 Bin 7 Bin 8
count count count count count count count count
21 35 100 30 9 3 2 0
Average lifetime = (21 x 1 + 35 x 3 + 100 x 6 + 30 x 12 + 9 x 24 + 3 x 48 + 2 x 96) / 200
= (21 + 105 + 600 + 360 + 216 + 144 + 192) / 200
= 1638 / 200
= 8.19 seconds
Standard deviation = [(21 x (1 - 8.19)2 + 35 x (3 - 7.77)2 + ... ) / 200]1/2