Use of disk storage and file structure within a GPFS file system

A file system (or stripe group) consists of a set of disks that are used to store file metadata as well as data and structures used by GPFS™, including quota files and GPFS recovery logs.

When a disk is assigned to a file system, a file system descriptor is written on each disk. The file system descriptor is written at a fixed position on each of the disks in the file system and is used by GPFS to identify this disk and its place in a file system. The file system descriptor contains file system specifications and information about the state of the file system.

Within each file system, files are written to disk as in other UNIX file systems, using inodes, indirect blocks, and data blocks. Inodes and indirect blocks are considered metadata, as distinguished from data, or actual file content. You can control which disks GPFS uses for storing metadata when you create the file system using the mmcrfs command or when modifying the file system at a later time by issuing the mmchdisk command.

Each file has an inode containing information such as file size, time of last modification, and extended attributes. The inodes of very small files and directories actually hold data. This increases performance for small files and directories. The inodes of small files also contain the addresses of all disk blocks that comprise the file data. A large file can use too many data blocks for an inode to directly address. In such a case, the inode points instead to one or more levels of indirect blocks that are deep enough to hold all of the data block addresses. This is the indirection level of the file.

The metadata for each file is stored in the inode and contains information such as file name, file size, and time of last modification. The inodes of small files also contain the addresses of all disk blocks that comprise the file data. When a file is large, it typically requires too many data blocks for an inode to directly address. In this case the inode points instead to one or more levels of indirect blocks. These trees of additional metadata space for a file can hold all of the data block addresses for very large files. The number of levels required to store the addresses of the data block is referred to as the indirection level of the file.

A file starts out with direct pointers to data blocks in the inode; this is considered a zero level of indirection. As the file increases in size to the point where the inode cannot hold enough direct pointers, the indirection level is increased by adding an indirect block and moving the direct pointers there. Subsequent levels of indirect blocks are added as the file grows. The dynamic nature of the indirect block structure allows file sizes to grow up to the file system size.
Figure 1. GPFS files have a typical UNIX structure
This graphic depicts the structure of a GPFS file that is typical for most UNIX-based file systems. At the root is the inode for the file. This particular file has too many data blocks for the inode to directly point to. Therefore the inode points to two second level indirect pointer blocks. Each of those indirect blocks in turn points to two additional, first level indirect blocks. Three of those indirect blocks each point to two full data blocks. The fourth indirect block points to a single fragment.
File system limitations:
  1. The maximum number of mounted file systems within a GPFS cluster is 256.
  2. The supported file system size depends on the version of GPFS that is installed.
  3. The maximum number of files within a file system cannot exceed the architectural limit.

For the latest information on these file system limitations, see the IBM Spectrum Scale™ FAQ in IBM® Knowledge Center.

GPFS uses the file system descriptor to find all of the disks that make up the file system's stripe group, including their size and order. Once the file system descriptor is processed, it is possible to address any block in the file system. In particular, it is possible to find the first inode, which describes the inode file, and a small number of inodes that contain the rest of the file system information. The inode file is a collection of fixed length records that represent a single file, directory, or link. The unit of locking is the single inode. Specifically, there are fixed inodes within the inode file for the following:
  • Root directory of the file system
  • Block allocation map, which is a collection of bits that represent the availability of disk space within the disks of the file system. One unit in the allocation map represents a subblock or 1/32 of the block size of the file system. The allocation map is broken into regions that reside on disk sector boundaries. The number of regions is set at file system creation time by the parameter that specifies how many nodes will access this file system. The regions are separately locked and, as a result, different nodes can be allocating or de-allocating space represented by different regions independently and concurrently.
  • Inode allocation map, which represents the availability of inodes within the inode file. The Inode allocation map is located in the inode allocation file, and represents all the files, directories, and links that can be created. The mmchfs command can be used to change the maximum number of files that can be created in the file system up to the architectural limit.

The data contents of each of these files are taken from the data space on the disks. These files are considered metadata and are allocated only on disks where metadata is allowed.