Stopping the disk failure auto recovery operation

The auto recovery operation can impact the I/O performance across the cluster. To avoid this problem, you can stop auto recovery manually and restart it later when the cluster is not so busy. The disks that are not functioning must be recovered to protect your data.

Run the mmlsdisk -e command to see the disks that do not have the Up availability and the Ready status. If all the disks in the file system are functioning correctly, the system displays the following message: 6027-623 All disks up and ready.
  1. To stop the auto recovery process, stop the tschdisk and tsrestripefs processes on the file system manager node. Log in to the IBM Storage Scale file system manager node. Retrieve the tschdisk and tsrestripefs command processor ID through the ps -elf | grep -e tschdisk -e tsrestripefs command.

    Alternatively, check the IBM Storage Scale log (/var/adm/ras/mmfs.log.latest) in the file system manager node to see whether a tschdisk command is still running. When the restripefs command is invoked by the auto recovery and is still running, the command log message is redirected to /var/adm/ras/restripefsOnDiskFailure.log.<timestamp>(IBM Storage Scale 4.1 and IBM Storage Scale 4.1.1) or /var/adm/ras/autorecovery.log.<timestamp>(IBM Storage Scale 4.1.1 PTF1 and later).

  2. Take the following steps to stop the tschdisk and tsrestripefs command processes:
    1. Make a list of the file system manager nodes in the cluster. The list must include the file system manager node of each file system that is affected. To list the file system manager nodes, go to a file system manager node and issue the following command:
      mmlsmgr
      This command is in the directory /usr/lpp/mmfs/bin.

    2. Do the following actions for the tschdisk and tsrestripefs processes on each of the file system manager nodes in your list:
      1. If you are not connected to a file system manager node, connect to it with ssh.
      2. Issue the following command to list the back-end processes that are running and their command IDs:
        mmfsadm command list all
        In the following example, the tsrestripefs process is running in the back end (line 6) and its command ID is #92 (line 5):
        # mmfsadm command list all
        CrHashTable 0x7F7E64001A08 n 4
        cmd sock 75 cookie 3489916426 owner 12912 id 0x2D7ADC0785000064(#100) uses 1 type 14 start 1531294737.470181
        flags 0x106 SG none line 'command list all'
        cmd sock 70 cookie 2102087586 owner 4450 id 0x2D7ADC078500005C(#92) uses 1 type 13 start 1531294660.218091
        flags 0x117 SG fpofs line 'tsrestripefs /dev/fpofs -r'
        hold PIT/repair waitTime 6.082489
      3. If a back-end process is running, issue the following command to stop it:
        mmfsadm command stop <commandID>
        where <commandID> is the command ID of the back-end process from the previous step. The following example uses command ID 92 from the example in the previous step:
        mmfsadm command stop 92
      4. Run the mmfsadm command again to verify that the process is no longer running:
        mmfsadm command list all