|
The GPFS blocksize setting is the amount of data written to each disk in a file system before moving on to the next disk. There are three important characteristics to understand when considerign the appropriate value of blocksize:
- The blocksize is the largest size IO that GPFS can issue to the underlying device
- A subblock is 1/32nd of blocksize. This is the smallest allocation to a single file
- Sector is 512 byes. This is the smallest IO request size GPFS issues to the underlying device
This means that, for example, you use a blockszie of 1MB each file will use at least 32KB (1024KB / 32 = 32KB).
What if I do not know my application IO profile?
Often you do not have good information on the nature of the IO profile or the applications are so diverse it is difficult to optimize for one or the other. There are generally two approaches to designing for this type of situation separation or compromise.
Separation
In this model you create two file systems, one with a large file system blocksize for sequential applications and one with a smaller block size for small file applications. You can gain benefits from having file systems of two different block sizes even on a single type of storage. Or you can use different types of storage for each file system to further optimize to the workload. In either case the idea is that you provide two file systems to your end users, for scratch space on a compute cluster for example. Then the end users can run tests themselves by pointing the application to one file system or another to and determining by direct testing which is best for their workload. In this situation you may have one file system optimized for sequential IO with a 1MB blocksize and one for more random workloads at 256KB block size.
Compromise
In this situation you either do not have sufficient information on workloads (i.e. end users won't think about IO performance) or enough storage for multiple file systems. In this case it is generally recommended to go with a blocksize of 256KB or 512KB depending on the general workloads and storage model used. With a 256KB block size you will still get good sequential performance (though not necessarily peak marketing numbers) and you will get good performance and space utilization with small files (256KB has minimum allocation of 8KB to a file). This is a good configuration for multi-purpose research workloads where the application developers are focusing on their algorithms more than IO optimization. |