|
|
|
How many file systems should I use?
|
|
|
|
|
How many file systems should I use?
|
When setting up applications to use local file systems there are rules of thumb, developed by system administrators over many long nights spent in computer rooms everywhere. Rules regarding the maximum size of a file system, the number of concurrent processes accessing a single file system and backup performance have all been well defined. These rules were developed around good sound logic and experience. Though when architecting a solution with GPFS the rules change. Optimizing your infrastructure to take advantage of the strengths of GPFS requires new approach. The IBM General Parallel File System (GPFS) is not your typical file system.
To be clear, there is no need to change how your application interacts with file data. Certainly your application can continue to create, access and modify files using standard interfaces. Along this same line you can continue to layout file systems as usual, but you won't fully realize the benefits of a GPFS solution. Though the outside appearance may seem familiar what is inside is a whole new world of opportunities.
Developing your file system strategy with GPFS is not difficult, in fact, in many cases it is more straightforward to manage than using local file systems. They key idea to remember is "one." One is the answer to the question most often heard when designing a GPFS solution "How many file systems should I use?"
Does this mean that all GPFS solutions have only one file system? Of course not, but with the scalability, storage management and enhanced concurrency characteristics of GPFS a single file system can handle more than most single file systems. So as a general rule start with one file system, and only add additional file systems if:
- The total number of files required exceeds 2 billion
- Different workloads require the use of different GPFS file system block sizes
A small disclaimer here, keep in mind that these rules are based purely on technology even though it is common knowledge that technical design decisions involving IT infrastructure projects are often based more on corporate policy or other political forces. Focusing here on the technical aspects is an attempt to give you reasons behind the recommended design approach and some backing material to support you in your quest to make your job easier. The following sections cover these technical areas in more detail and are addressed at system administrators who are well versed in using local file systems that are now tasked with implementing a solution based on GPFS.
The discussion is broken down into three areas and compares and contracts how these areas of consideration compare between a local file system and GPFS.
- Performance
- Storage management
- Space Management
|
|
Performance
|
I start with performance because it is the most often the reason cited for design decisions. There are multiple dimensions to performance when discussing file systems. You can look at IO operations per second (IOPS) or MB per second of throughput. Here the discussion covers both IOPS and throughput. Traditionally when planning a file system layout strategy there is much consideration given to the relationship between the number of IO processes to the number of file systems to the number of physical disks per file system.
One common rule of thumb when designing local file system layouts is to keep each application, or set of application processes on its own distinct set of disks. The argument follows that if you keep the applications on separate disks you can reduce "head thrashing" on the storage and get better performance for your application. Head thrashing occurs when a disk spends too much time servicing head seeks that overall throughput is hindered. Head thrashing is certainly real but the impact of head thrashing is often extended into other arenas.
Where this theory breaks down is when it is applied to determining the performance impact of combining file systems. Combining file systems itself does not induce head thrashing if you use the same number of physical disks in both cases. For example, if you have 20 disks and the application is generating 200 IOs/sec that is 10 ios/sec/disk. It does not matter whether these IO's are spread over one or two file systems the total IOPS/disk remains the same. In fact, in many cases we have found that overall solution performance of grid workloads improves by using a common GPFS file system. This is often due to the fact that "jobs" that typically run on separate file systems often complete at different rates. When a file system is idle and another is active there are wasted disk cycles. GPFS spreads file system data over al the available disks so no matter what portion of a workload is executing it can utilize all available bandwidth. On the other hand when all application processes are active GPFS is very good at providing fair access to the storage.
Since disks are not the issue if there are concerns over the number of concurrent threads accessing a file system the focus should be on areas like lock management, space allocation and metadata management not disk head thrashing. So let's take a look at how GPFS manages these areas of performance. |
|
Locking
|
GPFS was designed from the ground up to handle parallel IO requests from thousands of application processes. One area that makes this possible is the scalable token management system in GPFS. Tokens (locks) are managed by multiple token manager servers. The number of token servers and which systems participate in token management is automatically determined and managed by the GPFS daemon. The system administrator has the option to direct which nodes are available for operation like token management but GPFS manages the token servers and automatically fails over this operation in the event of a node failure. Once a token is granted the metanode handles the metadata management for a particular file.
Metadata
The metanode in a GPFS cluster is the node responsible for maintaining the metadata for a given file. The metanode is determined dynamically for each file and reside on any node in the cluster. When a file is opened a metanode is assigned to manage the metadata for that file. Typically the node that first opened the file becomes the metanode. This node performs all metadata operations for that file until the file is closed or some other event triggers the reassignment of the metanode. This allows for extreme metadata scalability because all nodes in the cluster can participate in metadata management.
Space Allocation
Often a concern with local file systems is supporting concurrent sequential threads writing to a single file system. There is often the tendency of the file system to become highly fragmented. This problem is avoided in GPFS because the space allocation algorithm is designed to gracefully handle thousands of concurrent threads allocating simultaneously in a single file system. This is achieved by a very sophisticated space allocation algorithm developed for use in supercomputers. Basically instead of allocating block 1 then block 2, block 3 and so on space is allocated from all over each disk so multiple IO threads are not competing for space allocations. On the surface this may appear to create a "fragmented" allocation of space on disk though this is not the case. GPFS has another set of very sophisticated algorithms that are capable of reading the data very efficiently from the disk. So even in the case where 1,000 large files are concurrently written to a single file system read performance to any one of those files is the same as if they were written one at a time. Some other file systems do not handle this situation so cleanly.
All of these features allow you to efficiently place more data in a single file system. This ability provides great benefits beyond overall performance, it allows you to more efficiently manage available storage. |
|
Storage Management
|
Another reason often stated for partitioning data onto many small file systems is to allow better control, by the system administrator, over space and storage utilization. The argument goes something like this, "If I provide only the space required for the project I can more effectively control usage and growth". This may be true at some level, but achieving control by creating many small file systems comes at a cost. Providing several small file systems requires that you manage several file systems and the associated LUNS. Administration of these file systems includes maintaining complex SAN zoning, manually providing tiered storage management and independently backing them up. GPFS provides many features that allow you to efficiently manage larger data sets.
Quotas
Quotas are supported on a user, group or fileset which allows you to control space usage at multiple levels. A fileset is a sub-tree of a file system so quotas can be set on data regardless of what user creates the data.
Storage Pools
GPFS supports multiple pools of storage in each file system and a policy engine to manage the data placement. This allows you to easily move file data from pool to pool based on business rules you define. You create a rule that implements the business rule: "if this file has not been accessed in 30 days move it to the less expensive storage." This is done without stubbing (placing a metadata file in the file system in place of the actual inode) and can be done while the data remains online and the file does not move in the namespace. In addition to disk to disk movements data can be migrated from disk to tape storage.
Beyond the ability to provide storage tiers storage pools IO performance is determined at the pool level. This means that even within a file system you can provide multiple levels of IO performance. For example, performance critical data can be placed on fast storage while less performance critical data is stored on more economical storage, even if these files are placed next to each other in the name space.
Metadata Scanning
The ability to use fewer file systems allows you to make better use of available storage. A common concern with larger file systems is the ability to manage the file data. GPFS provides tools that make large volumes of data more manageable with the high performance metadata scanning engine.
GPFS can scan inode information, when running a delta backup for example, at a rate of approximately 1 million inodes per second (This rate was seen using 8 xSeries servers and a single DS4800). The ability to "query" the file system at this rate now allows you to manage much more data in a single file system. |
|
Conclusion
|
I hope this information helps you better plan your GPFS solution to best optimize your file systems for space usage, ease of administration and overall efficiency. Take advantage of your clustered file systems and enjoy the benefits that come with clustered data access like taking back your weekends. |
|
|
|
|