General approach to estimating storage
Estimating the space requirements for Db2 objects is easier if you collect and maintain a statistical history of those objects.
The accuracy of your estimates depends on the currentness of the statistical data. To ensure that the statistics history is current, use the MODIFY STATISTICS utility to delete outdated statistical data from the catalog history tables.
The amount of disk space you need for your data is not just the number of bytes of data; the true number is some multiple of that. That is,
The multiplier M depends on your circumstances. It includes factors that are common to all data sets on disk, as well as others that are particular to Db2. It can vary significantly, from a low of about 1.25 to 4.0 or more. For a first approximation, set M=2.
Whether you use extended address volumes (EAV) is also a factor in estimating storage. Although, the EAV factor is not a multiplier, you need to add 10 cylinders for each object in the cylinder-managed space of an EAV. Db2 data sets might take more space or grow faster on EAV compared to non-extended address volumes. The reason is that the allocation unit in the extended addressing space (EAS) of EAV is a multiple of 21 cylinders, and every allocation is rounded up to this multiple. If you use EAV, the data set space estimation for an installation must take this factor into account. The effect is more pronounced for smaller data sets.
For more accuracy, you can calculate M as the product of the following factors:
- Record overhead
- Allows for eight bytes of record header and control data, plus space wasted for records that do not fit exactly into a Db2 page. The factor can range from about 1.01 (for a careful space-saving design) to as great as 4.0. A typical value is about 1.10.
- Free space
- Allows for space intentionally left empty to allow for inserts and updates. You can specify this factor on the CREATE TABLESPACE statement. The factor can range from 1.0 (for no free space) to 200 (99% of each page used left free, and a free page following each used page). With default values, the factor is about 1.05.
- Unusable space
- Track lengths in excess of the nearest multiple of page lengths.
The following table shows the track size, number of pages per track,
and the value of the unusable-space factor for several different device
types.
Table 1. Unusable space factor by device type Device type Track size Pages per track Factor value 3380 47476 10 1.16 3390 56664 12 1.15 - Data set excess
- Allows for unused space within allocated data sets, occurring as unused tracks or part of a track at the end of any data set. The amount of unused space depends upon the volatility of the data, the amount of space management done, and the size of the data set. Generally, large data sets can be managed more closely, and those that do not change in size are easier to manage. The factor can range without limit above 1.02. A typical value is 1.10.
- Indexes
- Allows for storage for indexes to data. For data with no indexes, the factor is 1.0. For a single index on a short column, the factor is 1.01. If every column is indexed, the factor can be greater than 2.0. A typical value is 1.20.
The following table shows calculations of the multiplier M for three different database designs:
- The tight design is carefully chosen to save space and allows only one index on a single, short field.
- The loose design allows a large value for every factor, but still well short of the maximum. Free space adds 30% to the estimate, and indexes add 40%.
- The medium design has values between the other two. You might want to use these values in an early stage of database design.
In each design, the device type is assumed to be a 3390. Therefore, the unusable-space factor is 1.15. M is always the product of the five factors.
Factor | Tight design | Medium design | Loose design |
---|---|---|---|
Record overhead × | 1.02 | 1.10 | 1.30 |
Free space × | 1.00 | 1.05 | 1.30 |
Unusable space × | 1.15 | 1.15 | 1.15 |
Data set excess × | 1.02 | 1.10 | 1.30 |
Indexes = | 1.02 | 1.20 | 1.40 |
Multiplier M | 1.22 | 1.75 | 3.54 |
- Image copies of data sets, which can be on tape
- System libraries, system databases, and the system log
- Temporary work files for utility and sort jobs
Also, you need to add the EAV factor.