|
|
|
Data and Metadata - Separate or mixed
|
|
|
|
|
|
When creating a GPFS file system you have the option of storing the file metadata (inode information) inline with the data or on separate disks. So when should you create a file system with inline metadata and when should you place the file system metadata on a separate disk?
Deciding whether to split data and metadata in a file system is based on a couple different parameters. In general there are three cases where you would split data and metadata:
- Tiered Storage
- SATA drives
- Optimize Metadata Performance
For further information on tuning storage for use with GPFS see "Tuning Storage for Use with GPFS ."
|
|
Tiered storage
|
When you are planning on using multiple user pools (for file data) it is a good idea to create the system pool as metadataOnly. Doing this aids in managing metadata space, availability and performance by allowing you to manage the metadata performance and space separately from the other data pools. |
|
SATA Drives
|
Metadata IO operations are often small (4k and 8k) random reads and writes. SATA drives do not have good performance for small random IO operations. So it is common to optimize your cost of storage by placing file metadata on Fibre Channel or SAS disks and file data on SATA disks. Typically there are very few metadata disks required (Typicaly ~1% of data storage) to realize the benefits in improved metadata performance compared with the number of data drives. The exact ratio of data space to metadata space depends on average file size and other data characteristics
When you file system is comprised of all SATA drives it is best to spread the metadata over all the LUNS, or at least a bunch of the LUNS. In addition you should enable caching at the storage server for each LUN containing metadat information. This includes read and write caching. With GPFS you should disable any read prefectch mechinisms for the caching of data in the storage server. Read prefetch at the storage needs to be disabled becuase the GPFS mechinism for allocating space, though it works very well for high concurrecny environments and is very high performance, is difficult for storage algorythms to "predict". GPFS will compinsate by prefetching data. |
|
Metadata Performance
|
In a configuration supporting small file IO or any workload that generates many non-sequential metadata operations placing the metadata on a separate storage controller may be a good idea. Separating metadata onto separate arrays allows optimal use of the cache on the storage server. This way all the storage server cache can be used to support metadata operations. |
|
|
|
|