Block size

Choose the file system block size based on the projected workload of the file system and the type of storage that it uses.

General

In a file system, a block is the largest contiguous amount of disk space that can be allocated to a file and also the largest amount of data that can be transferred in a single I/O operation. The block size determines the maximum size of a read request or write request that a file system sends to the I/O device driver. Blocks are composed of an integral number of subblocks, which are the smallest unit of contiguous disk space that can be allocated to a file. Files larger than one block are stored in some number of full blocks plus any subblocks that might be required after the last block to hold the remaining data. Files smaller than one block size are stored in one or more subblocks.

For information about setting block size and subblock size for a file system, see the descriptions of the -B BlockSize parameter and the --metadata-block-size parameter in the help topic mmcrfs command. Here are some general facts from those descriptions:
  • The block size, subblock size, and number of subblocks per block of a file system are set when the file system is created and cannot be changed later.
  • All the data blocks in a file system have the same block size and the same subblock size. Data blocks and subblocks in the system storage pool and those in user storage pools have the same sizes. An example of a valid block size and subblock size is a 4 MiB block with an 8 KiB subblock.
  • All the metadata blocks in a file system have the same block size and the same subblock size. The metadata blocks and subblocks are set to the same sizes as data blocks and subblocks, unless the --metadata-block-size parameter is specified.
    Note: The --metadata-block-size parameter that is used to specify a different metadata block size than the data block size is being deprecated. This option is no longer required to use for performance improvements for file systems with file system format 5.0.0 or later and it will be removed in a future release.
  • If the system storage pool contains only metadataOnly NSDs, the metadata block can be set to a different size than the data block size with the --metadata-block-size parameter.
    Note: This setting can result in a change in the data subblock size and in the number of subblocks in a data block, if the block size (-B parameter) is different from the --metadata-block-size. For an example, see Scenario 3 in a later bullet in this list.
  • The data blocks and metadata blocks must have the same number of subblocks, even when the data block size and the metadata block size are different. See Scenario 3 in the next bullet.
  • The number of subblocks per block is derived from the smallest block size of any storage pool in the file system, including the system metadata pool. Consider the following example scenarios:
    Note: For a table of the valid block sizes and subblock sizes, see Table 1 in mmcrfs command.
    • Scenario 1: The file system is composed of a single system storage pool with all the NSD usage configured as dataAndMetadata. The file system block size is set with the -B parameter to 16MiB. As a result, the block size for both metadata and data blocks is 16 MiB. The metadata and data subblock size is 16 KiB.
    • Scenario 2: The file system is composed of multiple storage pools with system storage pool NSD usage configured as metadataOnly and user storage pool NSD usage configured as dataOnly. The file system block size is set (-B parameter) to 16 MiB. The --metadata-block-size is also set to 16 MiB. As a result, the metadata and data block size is 16 MiB. The metadata and data subblock size is 16 KiB.
    • Scenario 3: The file system is composed of multiple storage pools with the system storage pool NSD usage configured as metadataOnly and the user storage pool NSD usage configured as dataOnly. The file system block size is set (-B parameter) to 16 MiB, which has a subblock size of 16 KiB, but the --metadata-block-size is set to 1 MiB, which has a subblock size of 8 KiB. The number of subblocks across the pools of a file system needs to be the same and this is calculated based on the storage pool with smallest block size. In this case, the system pool has the smallest block size (1 MiB). The number of subblocks per block in the system storage pool is 128 (1 MiB block size / 8 KiB subblock size = 128 subblocks per block). The other storage pools inherit the 128-subblocks-per-block setting and their subblock size is recalculated based on 128 subblocks per block. In this case the subblock size of the user storage pool is recalculated as 128 KiB (16 MiB / 128 subblocks per block = 128 KiB subblock size)
  • The block size cannot exceed the value of the cluster attribute maxblocksize, which can be set by the mmchconfig command.
Select a file system block size based on the workload and the type of storage. For a list of supported block sizes with their subblock sizes, see the description of the -B BlockSize parameter in the help topic mmcrfs command.
Attention: In IBM Storage Scale, the default block size of 4 MiB with an 8 KiB subblock size provides good sequential performance, makes efficient use of disk space, and provides good or similar performance for small files compared to the previous default block size of 256 KiB with 32 subblocks per block. It works well for the widest variety of workloads.

Test actual performance with different block sizes

The ideal file system block size can be determined by running performance tests with different file system block sizes using actual workloads or representative benchmarks that match the file sizes that you expect to use in production.

Factors that can affect performance

For more performance information, see the IBM Storage Scale white papers in the Techdocs Library (www.ibm.com/support/techdocs/atsmastr.nsf/Web/WhitePapers).

RAID stripe size
The RAID stripe size is the size of the sequential block of data that a disk array writes to or reads from each storage volume (the block device corresponding to an NSD). For better performance, it is a good idea to set the file system block size to the same value as either the RAID stripe size or a multiple of the RAID stripe size. If the block size is not equal to or a multiple of the RAID stripe size, then the file system performance can be severely degraded, especially for write requests, because of the increase in read-modify-write operations that occur in the underlying hardware RAID controllers.
Note: The block size for IBM Storage Scale RAID that is implemented with vdisk is specifically designed for optimal behavior. For IBM Storage Scale RAID, the block size must be equal to the vdisk track size. For more information, see the online documentation available for IBM Storage Scale RAID Documentation.

File system size
For file systems larger than 100 TiB, it is a good idea to set the block size to at least 256 KiB. The default block size is 4 MiB in IBM Storage Scale . Generally larger block sizes provide better performance.

Large block size and page pool
For block sizes larger than the default size of 4 MiB, it is a good idea to increase the page pool size in proportion to the block size. The reason is that the efficiency of internal optimizations that rely on caching file data in the GPFS page pool depends more on the number of blocks that are cached than on the amount data that is cached. A larger block size results in fewer cached blocks.

Variation in file size
For a file system that contain files of many different sizes, the file system delivers better overall performance from selecting a larger block size, 4 MiB or greater, rather than a smaller one. It is true that with a larger block size some space is wasted when a small file is written into a large subblock, because the unused space in the subblock cannot be written to with data from another file unless the block is freed.

However, the amount of waste in the general case is likely to be insignificant overall, because the smaller files occupy a smaller percentage of the storage space in the file system compared to the space occupied by the larger files (files on the order of GiBs).

Application I/O patterns
The effect of block size on file system performance greatly depends on the application I/O pattern:
  • A larger block size is often beneficial for large sequential read and write workloads.
  • A smaller block size can offer better performance for applications that do small random writes to sparse files or small random writes to large files that are subject to frequent snapshots.

Metadata performance
The choice of block size affects the performance of certain metadata operations, in particular, block allocation performance. The IBM Storage Scale block allocation map is stored in blocks, similar to regular files. When the block size is small:
  • More blocks are required to store the same amount of data, which results in more work to allocate those blocks
  • One block of allocation map data contains less information

Metadata-only system pool
The --metadata-block-size option on the mmcrfs command allows a different block size to be specified for the system storage pool, provided its usage is set to metadataOnly. Valid values are the same as the ones that are listed for the -B parameter.
Note: Setting the metadata block size to a different value than the data block size can have the effect of changing the data subblock size and the number of subblocks per data block. For more information see Scenario 3 earlier in this help topic.