IBM Support

IBM Spectrum Scale: GPUDirect Storage (GDS) with accelerated writes is available as a Technical Preview feature in Spectrum Scale 5.1.5.

Flashes (Alerts)


Abstract

Accelerated writes with GPUDirect Storage (GDS) are available as a Technical Preview feature in Spectrum Scale 5.1.5. Accelerated GDS writes can be tested on nonproduction systems. The support of accelerated GDS reads and writes in compatibility mode remains unchanged including the use in production environments as in the previous releases. The interface used by applications (cuFileWrite()) remains unchanged; only the internal data transfer changes after the client has been configured to use GDS accelerated writes.

Content

This web page describes the use of accelerated writes with GPUDirect Storage (GDS) with Spectrum Scale version 5.1.5. In this release accelerated writes with GDS are provided as a tech preview only. The succeeding version of Spectrum Scale 5.1.6 has been released in December 2022 and it includes the supported version of accelerated writes with GPUDirect Storage. Check https://www.ibm.com/docs/en/spectrum-scale/5.1.6?topic=architecture-gpudirect-storage-support-spectrum-scale for details. IBM Spectrum Scale 5.1.6 is the recommended version for GPUDirect Storage. Upgrade to this level if you are using GPUDirect Storage.
Description:
In Spectrum Scale GDS read/writes are RDMA operations executed by an NSD server transferring data directly between GPU buffers and the storage servers. The GDS APIs used by applications are cuFileRead(.) and cuFileWrite(.). In previous releases Spectrum Scale has supported cuFileRead() as an accelerated GDS operation and cuFileWrite() in compatibility mode. In compatibility mode, the data is first copied from the GPU buffer to client system memory (by the CUDA GDS library) and then transferred to the storage servers by using Direct IO. This tech preview introduces accelerated GDS writes where data is transferred directly from the GPU buffer to the storage servers, avoiding the additional copy on the client.
Requirements:
A working system for GDS with Spectrum Scale needs to be set up. Follow the product documentation to do so.
For writes the following extra requirements and components are needed:
  • Spectrum Scale 5.1.5 on the client (tech preview, non-production use)
  • Spectrum Scale 5.1.5 on the NSD server (tech preview, non-production use)
  • CUDA 11.8 (*) see below
  • MOFED 5.4-3.0.1.0 or 5.6-2.0.9.0
  • The cufile.json config file needs to be changed: In the filesystem-specific section ("fs":"gpfs") the following key/value has to be added:   
                        "gpfs": {                           
                                                    "gds_write_support": true                
                                      },
All other diagnostic means like counters (mmdiag--gds, RDMA counters, etc.) and log files (mmfslog) work by analogy for writes as for reads.
Diagnostics:
The platform check (“gdscheck -p”) should show the following key/value pairs:
CUFILE CONFIGURATION:
properties.use_compat_mode : false      <--- !!
properties.force_compat_mode : false
properties.gds_rdma_write_support : false     <--- !!
properties.use_poll_mode : false
properties.poll_mode_max_size_kb : 4
properties.max_batch_io_size : 128
properties.max_batch_io_timeout_msecs : 5
properties.max_direct_io_size_kb : 16384
properties.max_device_cache_size_kb : 131072
properties.max_device_pinned_mem_size_kb : 33554432
properties.posix_pool_slab_size_kb : 4 1024 16384
properties.posix_pool_slab_count : 128 64 32
properties.rdma_peer_affinity_policy : RoundRobin
properties.rdma_dynamic_routing : 0
fs.generic.posix_unaligned_writes : false
fs.lustre.posix_gds_min_kb: 0
fs.beegfs.posix_gds_min_kb: 0
fs.weka.rdma_write_support: false
fs.gpfs.gds_write_support: true     <--- !!
Limitations:
Files with multiple data replicas are handled in compatibility mode.
Note:
GDS writes follow Direct IO semantics. As such, Spectrum Scale does not serialize concurrent GDS, and/or Direct IO writes to overlapping regions of a file. This needs to be controlled by the application. 
(*) CUDA 11.8 will not be officially available at the time of the Scale 5.1.5 release. Contact Nvidia GDS Product Manager Maitree Kanungo (mkanungo@nvidia.com) for a preview version to allow for GDS accelerated writes with Spectrum Scale.

[{"Type":"MASTER","Line of Business":{"code":"LOB26","label":"Storage"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"STXKQY","label":"IBM Spectrum Scale"},"ARM Category":[{"code":"a8m3p000000hAkYAAU","label":"GPFS"}],"ARM Case Number":"","Platform":[{"code":"PF016","label":"Linux"}],"Version":"5.1.5"}]

Document Information

Modified date:
06 December 2022

UID

ibm16613023