Windows: Estimating database space requirements based on the number of files

If you can estimate the maximum number of files that might be in server storage at a time, you can use that number to estimate space requirements for the database.

About this task

To estimate space requirements that is based on the maximum number of files in server storage, use the following guidelines:
  • 600 - 1000 bytes for each stored version of a file, including image backups.
    Restriction: The guideline does not include space that is used during data deduplication.
  • 100 - 200 bytes for each cached file, copy storage pool file, active-data pool file, and deduplicated file.
  • Additional space is required for database optimization to support varying data-access patterns and to support server back-end processing of the data. The amount of extra space is equal to 50% of the estimate for the total number of bytes for file objects.

In the following example for a single client, the calculations are based on the maximum values in the preceding guidelines. The examples do not take into account that you might use file aggregation. In general, when you aggregate small files, it reduces the amount of required database space. File aggregation does not affect space-managed files.

Procedure

  1. Calculate the number of file versions. Add each of the following values to obtain the number of file versions:
    1. Calculate the number of backed-up files. For example, as many as 500,000 client files might be backed up at a time. In this example, storage policies are set to keep up to three copies of backed up files:
      500,000 files * 3 copies = 1,500,000 files
    2. Calculate the number of archive files. For example, as many as 100,000 client files might be archived copies.
    3. Calculate the number of space-managed files. For example, as many as 200,000 client files might be migrated from client workstations.
    Using 1000 bytes per file, the total amount of database space that is required for the files that belong to the client is 1.8 GB:
    (1,500,000 + 100,000 + 200,000) * 1000 = 1.8 GB
  2. Calculate the number of cached files, copy storage-pool files, active-data pool files, and deduplicated files:
    1. Calculate the number of cached copies. For example, caching is enabled in a 5 GB disk storage pool. The high migration threshold of the pool is 90% and the low migration threshold of the pool is 70%. Thus, 20% of the disk pool, or 1 GB, is occupied by cached files.
      If the average file size is about 10 KB, approximately 100,000 files are in cache at any one time:
      100,000 files * 200 bytes = 19 MB
    2. Calculate the number of copy storage-pool files. All primary storage pools are backed up to the copy storage pool:
      (1,500,000 + 100,000 + 200,000) * 200 bytes = 343 MB
    3. Calculate the number of active storage-pool files. All the active client-backup data in primary storage pools is copied to the active-data storage pool. Assume that 500,000 versions of the 1,500,000 backup files in the primary storage pool are active:
      500,000 * 200 bytes = 95 MB
    4. Calculate the number of deduplicated files. Assume that a deduplicated storage pool contains 50,000 files:
      50,000 * 200 bytes = 10 MB
    Based on the preceding calculations, about 0.5 GB of extra database space is required for the client’s cached files, copy storage-pool files, active-data pool files, and deduplicated files.
  3. Calculate the amount of extra space that is required for database optimization. To provide optimal data access and management by the server, extra database space is required. The amount of extra database space is equal to 50% of the total space requirements for file objects.
    (1.8 + 0.5) * 50% = 1.2 GB
  4. Calculate the total amount of database space that is required for the client. The total is approximately 3.5 GB:
    1.8 + 0.5 + 1.2 = 3.5 GB
  5. Calculate the total amount of database space that is required for all clients. If the client that was used in the preceding calculations is typical and you have 500 clients, for example, you can use the following calculation to estimate the total amount of database space that is required for all clients:
    500 * 3.5 = 1.7 TB 

Results

Tip: In the preceding examples, the results are estimates. The actual size of the database might differ from the estimate because of factors such as the number of directories and the length of the path and file names. Periodically monitor your database and adjust its size as necessary.

What to do next

During normal operations, the IBM Spectrum Protect™ server might require temporary database space. This space is needed for the following reasons:
  • To hold the results of sorting or ordering that are not already being kept and optimized in the database directly. The results are temporarily held in the database for processing.
  • To give administrative access to the database through one of the following methods:
    • A DB2® open database connectivity (ODBC) client
    • An Oracle Java™ database connectivity (JDBC) client
    • Structured Query Language (SQL) to the server from an administrative-client command line

Consider using an extra 50 GB of temporary space for every 500 GB of space for file objects and optimization. See the guidelines in the following table. In the example that is used in the preceding step, a total of 1.7 TB of database space is required for file objects and optimization for 500 clients. Based on that calculation, 200 GB is required for temporary space. The total amount of required database space is 1.9 TB.

Database size Minimum temporary-space requirement
< 500 GB 50 GB
≥ 500 GB and < 1 TB 100 GB
≥ 1 TB and < 1.5 TB 150 GB
≥ 1.5 and < 2 TB 200 GB
≥ 2 and < 3 TB 250 - 300 GB
≥ 3 and < 4 TB 350 - 400 GB