Pruning historical data from the repository database

IBM® Data Server Manager (DSM) uses data thinning, or resolution reduction pruning, to control how long historical data is retained in the repository database.

The Resolution Reduction Pruning schedule

Instead of rolling up finer-grained data into courser grain sizes, as is employed by roll-up style data pruning, the data thinning method used by DSM simply deletes some of the existing data points . These deleted data points are interleaved with retained data points over time and are chosen to achieve a target temporal resolution of the remaining data points as closely as possible.

For example, if data is currently at 15-minute resolution (one data point persisted every 15 minutes per monitoring metric) and the new target is 60-minute resolution, roughly three out of every four data points will be deleted. This is analogous to reducing the resolution of an image by replacing every 2x2 square of pixels in the original image with the pixel in the upper-left corner, whereas a "roll-up" style of pruning would be analogous to replacing each 2x2 square of pixels with the average of those four pixels. DSM uses resolution reduction instead of roll-up pruning because it is more I/O efficient and, as a result, can scale to a higher number of monitored databases sharing one repository database.

How DSM uses data thinning to prune historical data

DSM uses an underlying matrix of target resolutions when performing resolution reduction pruning:
Target Resolution Time Period Age of Data Width of Time Period
original week 1 0-1 week 1 week
15 minutes weeks 2-3 1-3 weeks 2 weeks
30 minutes weeks 4-7 3-7 weeks 4 weeks
1 hour weeks 8-15 7-15 weeks 8 weeks
2 hours weeks 16-infinity 15 weeks and older unlimited
These target resolutions interact with the Monitoring Profile parameters as follows:
  • The schedule is not applied until the number of weeks specified by the Allow reduction in data resolution after property have already passed.
  • After that period has passed, the schedule corresponding to the actual data age is applied.
  • If the data is already of equal or lower resolution than the applicable target resolution for the data age, no pruning is done until the data ages to a point where the target resolution is lower than that of the data itself.
For example, if the default Persist data every setting of 1 hour, set in the monitoring profile, is used, the original data resolution is already less than or equal to the target resolution of the first four rows, so this data will not undergo pruning until it reaches at least 15 weeks of age (the start of the 16th week), at which point it will be thinned to two-hour resolution by deleting every other data point over time.

If the Persist data every setting is changed to five (5) minutes and the Allow reduction in data resolution after setting in the monitoring profile is changed to two weeks, then the data will age for two weeks without any pruning, due to the latter parameter setting, even though the pruning schedule would normally reduce its resolution to 15 minutes after it has become at least one week old.

Once the two weeks specified by the Allow reduction in data resolution after setting have passed, the data is two weeks old, and in the middle of the 15-minute target resolution window that spans data aged 1-3 weeks. Therefore, at that time, the data will be pruned to 15-minute resolution by deleting two out of every three data points. After the data has aged further and becomes more than three weeks old, it enters the next lowest target resolution on the pruning schedule and will be reduced to 30-minute resolution by deleting every other data point.

The resolution reduction pruning algorithm does not blindly delete data points based on any assumed current data resolution. Instead, it looks at the actual timestamps of the data points present, chooses a set of to-be-retained data points that matches the target pruning resolution as closely as possible, and then deletes all the data points at timestamps in between those to be retained. Thus changes in the Persist data every setting, or reboots of DSM, do not cause any problems for the resolution reduction pruning algorithm, even though the timestamps of the data points before pruning may be irregularly spaced or may not be a perfect multiple of the target resolution.
Note: These above settings only apply to general monitoring, and do NOT apply to the following:
  • alerts
  • SQL statement execution data
  • track configuration changes
  • storage access patterns
  • storage savings
  • locking event monitoring data
  • threshold violations event monitoring
  • utility event monitoring