Suboptimal performance due to failover of NSDs to secondary server - NSD server failure

In a shared storage configuration, failure of an NSD server might result in the failover of its NSDs to the secondary server, if the secondary server is active. This can reduce the total number of NSD servers actively serving the file system, which in turn impacts the file system's performance.

Problem identification

In IBM Storage Scale, the system-defined node class “nsdnodes” contains all the NSD server nodes in the IBM Storage Scale cluster. Issue the mmgetstate –N nsdnodes command to verify the state of the GPFS daemon. The GPFS file system performance might degrade if one or more NSD servers are in the down or arbitrating or unknown state.

The following example displays two nodes: one in active state and the other in down state

# mmgetstate -N nsdnodes
 Node number  Node name        GPFS state  
          ------------------------------------------
                 1      c25m3n07-ib      active
                 2      c25m3n08-ib      down

Problem resolution and verification

Resolve any system-level or software issues that exist. For example, confirm that NSD server have no network connectivity problems, or that the GPFS portability modules are correctly built for the kernel that is running. Also, perform necessary low-level tests to ensure that both the NSD server and the communication to the node are healthy and stable.

Verify that no system or software issues exist, and start GPFS on the NSD server by using the mmstartup –N <NSD_server_to_revive> command. Use the mmgetstate –N nsdnodes command to verify that the GPFS daemon is in active state as shown:

# mmgetstate -N nsdnodes
Node number  Node name        GPFS state
         -----------------------------------------
               1      c25m3n07-ib      active
               2      c25m3n08-ib      active