Installing GPUDirect Storage for IBM Storage Scale

IBM Storage Scale support for GPUDirect Storage (GDS) enables a direct path between GPU memory and storage.

For information about the prerequisites, see Planning for GPUDirect Storage.

Perform the following steps to install GDS:

  1. Install IBM Storage Scale with NSD servers for your file system. For more information, see Installing IBM Storage Scale on Linux nodes and deploying protocols.
  2. Set up the InfiniBand fabric (or RoCE fabric by analogy). For details about the installation instructions, see Mellanox OFED installation. Complete the following steps to configure InfiniBand fabric:
    1. Use the latest MOFED driver. For details about the supported driver versions, see Components required for GDS in IBM Storage Scale FAQ in IBM® Documentation.
    2. Set up IP over InfiniBand. For more information, see Configuring InfiniBand and RDMA networks.
    3. If your NSD servers are part of ESS, then reinstall the MOFED driver with the --upstream-libs flag. Use the MOFED version that ships with the ESS.
  3. Install CUDA on the NSD clients running the GPUs. For more information about the supported version of CUDA, see Components required for GDS in IBM Storage Scale FAQ in IBM Documentation.
  4. Configure IBM Storage Scale. For more information, see Configuring the GPFS cluster.
  5. Configure the NVIDIA components. For more information, see Configuring CUDA.
  6. Start IBM Storage Scale by using the mmstartup -a command.

Checking the installation

Perform the following steps to check the installation of GDS:

  1. Run the NVIDIA GDS utility gdscheck -p before you run the GDS workloads to verify the environment. You need Python3 installed on the node to run this utility.
  2. Verify the status of PCIe Access Control Services (ACS) and PCIe Input/Output Memory Management Unit (IOMMU), as these components affect GDS function and performance. The output of the gdscheck -p utility must display the following status for IOMMU and ACS components:
    IOMMU disabled
    ACS disabled
  3. Check for IBM Storage Scale support in the output of gdscheck -p as shown in the following example:
    # gdscheck -p | grep "Spectrum Scale"
    
     IBM Spectrum Scale : Supported
  4. If the GPFS log (mmfs log) contains the following information after you start IBM Storage Scale, it indicates that the GDS support is successfully enabled:
    [I] VERBS DC API loaded. 
    and
    
    [I] VERBS DC API initialized.