getting some feedback on different approaches to the GPFS architecture.
The cluster will have about 25~50 nodes initially (up to 1000 CPU-cores),
expected to grow to about 50~80 nodes.
The jobs are primarily independent, single-threaded, with a mixture of
small- medium-sized IO, and a lot of random access. It is very common to
have 100s of jobs, each accessing the same directories, often with an overlap
of the same data files.
For example, many jobs on different nodes will use the same executable
and the same baseline data models, but will differ in individual data
files to compare to the model.
My goal is to ensure reasonable performance, particularly when there's
a lot of contention from multiple jobs accessing the same meta-data and
some of the same data files.
My question here is in a choice between two GPFS archicture designs.
The storage array configurations, drive types, RAID types, etc. are
also being examined separately. I'd really like to hear any suggestions
about these (or other) configurations:
.h2  Large GPFS servers
- About 5 GPFS servers with significant RAM. Each GPFS server would be connected to storage via an 8Gb/s fibre SAN (multiple paths) to storage arrays.
- Each GPFS server would provide NSDs via 10Gb/s and 1Gb/s (for legacy servers) ethernet to GPFS clients (computational compute nodes).
- Since the GPFS clients would not be SAN attached with direct access to block storage, and many clients (~50) will access similar data (and the same directories) for many jobs, it seems like it would make sense to do a lot of caching on the GPFS servers. Multiple clients would benefit by reading from the same cached data on the servers.
- I'm thinking of sizing caches to handle 1~2GB per core in the compute nodes, divided by the number of GPFS servers. This would mean caching (maxFilesToCache, pagepool, maxStatCache) on the GPFS servers of about 200GB+ on each GPFS server.
Is there any way to configure GPFS so that the GPFS servers can do a large amount of caching without requiring the same resources on the GPFS clients?
Is there any way to configure the GPFS clients so that their RAM can be used primarily for computational jobs?
.h2  Direct-attached GPFS clients
- About 3~5 GPFS servers with modest resources (8CPU-cores, ~60GB RAM).
- Each GPFS server and client (HPC compute node) would be directly connected to the SAN (8Gb/s fibre, iSCSI over 10Gb/s ethernet, FCoE over 10Gb/s ethernet).
- Either 10Gb/s or 1Gb/s ethernet for communication between GPFS nodes.
- Since this is a relatively small cluster, in terms of the total node count, the increased cost in terms of HBAs, switches, and cabling for direct-connecting all nodes to the storage shouldn't be excessive.