Skip to main content
 
developerworks > Community >  Dashboard > HPC Central Wiki > ... > General Parallel File System (GPFS) > File System Planning
developerWorks
Log In   View a printable version of the current page.
Overview New to Forums Wikis
File System Planning
Added by ScottGPFS, last edited by ScottGPFS on Jun 23, 2011  (view change)
Labels: 
(None)

File system planning and performance tuning is the basis of designing a GPFS solution.

Contents

File System Blocksize

When creating a file system there are two types of parameters: Those that can be changed dynamically and those that cannot.  The key parameter that needs to be determined at file system creation is the file system block size. Once set, the only way to change the block size is to recreate the file system so it is recommended to test the blocksize on a development system before production deployment.

GPFS supports block sizes from 16 KB to 4MB with a default of 256 KB. The blocksize of a file system is determined at creation using the -B parameter to the mmcrfs command. In addition to using the -B parameter to create a file system with a blocksize greater than 1MB you need to increase the value of maxblocksize. The default value of maxblocksize is 1M and the allowable range is 16K to 16M. For block sizes larger than 1MB it is recommended that maxblocksize matches the value of blocksize.

So how do you chose a blocksize for your file system? Of course it is best to test the impact of various blocksize settings with your application. If you cannot test various values of blocksize or you are just looking for a starting point Table 1 provides a rough guideline for what file system block sizes may be appropriate for various types of applications.


IO Type
Application Examples
Blocksize
Large Sequential IO
Scientific Computing, Digital Media
1MB to 4MB
Relational Database
DB2, Oracle
512kb
Small I/O Sequential
General File Service, File based Analytics,Email, Web Applications
256kb
Special*
Special
16KB-64KB

*Since GPFS 3.3 there are very few workloads that benefit from a file system blocksize of 16KB or 64KB. If you do not have a chance to test your application performance with various file system blocksize settings you should use the default of 256KB.

The GPFS blocksize setting is the amount of data written to each disk in a file system before moving on to the next disk. There are three important characteristics to understand when considerign the appropriate value of blocksize:

  • The blocksize is the largest size IO that GPFS can issue to the underlying device
  • A subblock is 1/32nd of blocksize. This is the smallest allocation to a single file
  • Sector is 512 byes. This is the smallest IO request size GPFS issues to the underlying device

 This means that, for example, you use a blockszie of 1MB each file will use at least 32KB (1024KB / 32 = 32KB).

What if I do not know my application IO profile?

Often you do not have good information on the nature of the IO profile or the applications are so diverse it is difficult to optimize for one or the other. There are generally two approaches to designing for this type of situation separation or compromise.

Separation

In this model you create two file systems, one with a large file system blocksize for sequential applications and one with a smaller block size for small file applications. You can gain benefits from having file systems of two different block sizes even on a single type of storage. Or you can use different types of storage for each file system to further optimize to the workload. In either case the idea is that you provide two file systems to your end users, for scratch space on a compute cluster for example. Then the end users can run tests themselves by pointing the application to one file system or another to and determining by direct testing which is best for their workload. In this situation you may have one file system optimized for sequential IO with a 1MB blocksize and one for more random workloads at 256KB block size.

Compromise

In this situation you either do not have sufficient information on workloads (i.e. end users won't think about IO performance) or enough storage for multiple file systems. In this case it is generally recommended to go with a blocksize of 256KB or 512KB depending on the general workloads and storage model used. With a 256KB block size you will still get good sequential performance (though not necessarily peak marketing numbers) and you will get good performance and space utilization with small files (256KB has minimum allocation of 8KB to a file). This is a good configuration for multi-purpose research workloads where the application developers are focusing on their algorithms more than IO optimization.