Use of disk storage and file structure within a GPFS file system

A file system (or stripe group) consists of a set of disks that store file data, file metadata, and supporting entities, such as quota files and recovery logs.

When a disk is assigned to a file system, a file system descriptor is written on each disk. The file system descriptor is written at a fixed position on each of the disks in the file system and is used by GPFS to identify this disk and its place in a file system. The file system descriptor contains file system specifications and information about the state of the file system.

Within each file system, files are written to disk as in other UNIX file systems, using inodes, indirect blocks, and data blocks. Inodes and indirect blocks are considered metadata, as distinguished from data, or actual file content. You can control which disks GPFS uses for storing metadata when you create the file system with the mmcrfs command or when you modify the file system with the mmchdisk command.

The metadata for each file is stored in the inode and contains information such as file size and time of last modification. The inode also sets aside space to track the location of the data of the file. On file systems that are created in IBM Storage Scale, if the file is small enough that its data can fit within this space, the data can be stored in the inode itself. This method is called data-in-inode and improves the performance and space utilization of workloads that use many small files.

Otherwise, the data of the file must be placed in data blocks, and the inode is used to find the location of these blocks. The location-tracking space of the inode is then used to store the addresses of these data blocks. If the file is large enough, the addresses of all of its data blocks cannot be stored in the inode itself, and the inode points instead to one or more levels of indirect blocks. These trees of additional metadata space for a file can hold all of the data block addresses for large files. The number of levels that are required to store the addresses of the data block is referred to as the indirection level of the file.

To summarize, on file systems that are created in IBM Storage Scale, a file typically starts out with data-in-inode. When it outgrows this stage, the inode stores direct pointers to data blocks; this arrangement is considered a zero level of indirection. When more data blocks are needed, the indirection level is increased by adding an indirect block and moving the direct pointers there; the inode then points to this indirect block. Subsequent levels of indirect blocks are added as the file grows. The dynamic nature of the indirect block structure allows file sizes to grow up to the file system size.

For security reasons, encrypted files skip the data-in-inode stage. They always begin at indirection level zero.

Figure 1. GPFS files have a typical UNIX structure
This graphic depicts the structure of a GPFS file that is typical for most UNIX-based file systems. At the root is the inode for the file. This particular file has too many data blocks for the inode to directly point to. Therefore, the inode points to two second-level indirect pointer blocks. Each of those indirect blocks in turn points to two additional, first-level indirect blocks. Three of those indirect blocks each point to two full data blocks. The fourth indirect block points to a single fragment.
File system limitations:
  1. The maximum number of mounted file systems within a GPFS cluster is 256.
  2. The supported file system size depends on the version of GPFS that is installed.
  3. The maximum number of files within a file system cannot exceed the architectural limit.

For the latest information on these file system limitations, see the IBM Storage Scale FAQ in IBM® Documentation.

GPFS uses the file system descriptor to find all of the disks that make up the file system's stripe group, including their size and order. Once the file system descriptor is processed, it is possible to address any block in the file system. In particular, it is possible to find the first inode, which describes the inode file, and a small number of inodes that contain the rest of the file system information. The inode file is a collection of fixed-length records that represent a single file, directory, or link. The unit of locking is the single inode. Specifically, there are fixed inodes within the inode file for the following components:
  • Root directory of the file system.
  • Block allocation map, which is a collection of bits that represent the availability of disk space within the disks of the file system. One unit in the allocation map represents a subblock. A subblock is the smallest unit of contiguous disk space that can be allocated to a file. Block size, subblock size, and the number of subblocks per block are set when the file system is created and cannot be changed afterward. For more information, see mmcrfs command. The allocation map is broken into regions that reside on disk sector boundaries. The number of regions is set at file system creation time by the parameter that specifies how many nodes access this file system. The regions are separately locked and as a result, different nodes can be allocating or deallocating space that is represented by different regions independently and concurrently.
  • Inode allocation map, which represents the availability of inodes within the inode file. The Inode allocation map is located in the inode allocation file, and represents all the files, directories, and links that can be created. The mmchfs command can be used to change the maximum number of files that can be created in the file system up to the architectural limit.

The data contents of each of these files are taken from the data space on the disks. These files are considered metadata and are allocated only on disks where metadata is allowed. For more information, see mmcrfs command.