Topic
  • 6 replies
  • Latest Post - ‏2019-01-08T03:14:28Z by RobLogie
RobLogie
RobLogie
6 Posts

Pinned topic Cleanly shutting down a GPFS cluster

‏2018-12-24T03:46:35Z | gpfs spectrumscale

Hi All

 

Is there a recommended technique to cleanly completely shutdown a gpfs cluster so that maintenance of the underlying O/S can occur ?

I have tried the technique recommended in the GPFS documentation "Use the following information to shut down an IBM Spectrum Scale™ cluster in an emergency situation".  However I am finding  that when I restart the cluster occasionally an NSD will be in "Down" state and a "mmchdisk start"  is required to bring the disk back into the cluster.  This is especially a problem when sometimes it affects the cesroot disks and prevents the SMB service from starting.  Also,  for some of the large filesystems (10TB) in the cluster I manage running the chdisk command can take well over an hour.

(I am using GPFS 4.2.3.10 running on redhat 7.5)

 

Thanks in Advance

 

Rob

Updated on 2018-12-28T04:24:18Z at 2018-12-28T04:24:18Z by RobLogie
  • oester
    oester
    259 Posts

    Re: Cleanly shutting down a GPFS cluster

    ‏2018-12-24T16:41:09Z  

    Hi Rob

    I run into this on occasion as well. The problem occurs when one or more clients have open files/IO in process to the file system when the NSD server node is shutdown (mmshutdown). You can usually avoid this by making sure that file systems unmount cleanly on the client nodes. Before unmounting the file systems, you can run "lsof" to see if any processes still have open files to anything in GPFS. If you can kill these before you issue the unmount, the file system should unmount cleanly and then a shutdown of the NSD servers should not leave any disks offline. The other thing to check is run "mmlsmount all -L" and see if any of the file systems are mounted and on which nodes. If you see any mounted before you issue the mmshutdown on the NSD servers, investigate those.

     

    Bob Oesterlin

    Nuance Communications

  • RobLogie
    RobLogie
    6 Posts

    Re: Cleanly shutting down a GPFS cluster

    ‏2018-12-26T21:13:15Z  
    • oester
    • ‏2018-12-24T16:41:09Z

    Hi Rob

    I run into this on occasion as well. The problem occurs when one or more clients have open files/IO in process to the file system when the NSD server node is shutdown (mmshutdown). You can usually avoid this by making sure that file systems unmount cleanly on the client nodes. Before unmounting the file systems, you can run "lsof" to see if any processes still have open files to anything in GPFS. If you can kill these before you issue the unmount, the file system should unmount cleanly and then a shutdown of the NSD servers should not leave any disks offline. The other thing to check is run "mmlsmount all -L" and see if any of the file systems are mounted and on which nodes. If you see any mounted before you issue the mmshutdown on the NSD servers, investigate those.

     

    Bob Oesterlin

    Nuance Communications

    Hi Bob

    Thanks for the suggestions !

     

    Cheers

     

    Rob

  • RobLogie
    RobLogie
    6 Posts

    Re: Cleanly shutting down a GPFS cluster

    ‏2018-12-28T03:48:47Z  
    • oester
    • ‏2018-12-24T16:41:09Z

    Hi Rob

    I run into this on occasion as well. The problem occurs when one or more clients have open files/IO in process to the file system when the NSD server node is shutdown (mmshutdown). You can usually avoid this by making sure that file systems unmount cleanly on the client nodes. Before unmounting the file systems, you can run "lsof" to see if any processes still have open files to anything in GPFS. If you can kill these before you issue the unmount, the file system should unmount cleanly and then a shutdown of the NSD servers should not leave any disks offline. The other thing to check is run "mmlsmount all -L" and see if any of the file systems are mounted and on which nodes. If you see any mounted before you issue the mmshutdown on the NSD servers, investigate those.

     

    Bob Oesterlin

    Nuance Communications

    HI All

     

    I have done some more investigation into what is going on.

     

    The file systems are definitely not mounted on any of the nodes in the cluster when the mmshutdown -a command is issued.  Also lsof does not show any files open on the filesystems before I attempt the mmunmount

     

    Of interest it sometimes takes a while for some of the nodes to unmount, with the mmlsmount showing the filesysetem is in "(internal mount)" state.

    However, if I wait long enough eventually (anything from a couple of minutes to around 30 minutes, not good when you are doing maintenance at 1am)  that filesystem unmounts. 

     

    I then do a mmshutdown -a. After then doing a mmstartup -a  I am still finding the problem of disks being down seems to happen on random filesystems.  Again, the disk down problem does not happen every time you restart the cluster.  Sometimes it restarts cleanly.

     

    Thanks !

     

    Rob


     

  • oester
    oester
    259 Posts

    Re: Cleanly shutting down a GPFS cluster

    ‏2019-01-02T13:28:18Z  

    On the nodes where you do the mmumount, do all of them unmount "cleanly" - meaning you don't see any file system busy errors? Looking thru the mmfslog on the nodes may provide some clues.

     

    Bob

  • sxiao
    sxiao
    61 Posts

    Re: Cleanly shutting down a GPFS cluster

    ‏2019-01-02T18:12:47Z  
    • oester
    • ‏2019-01-02T13:28:18Z

    On the nodes where you do the mmumount, do all of them unmount "cleanly" - meaning you don't see any file system busy errors? Looking thru the mmfslog on the nodes may provide some clues.

     

    Bob

    Make sure you don't have any scripts that issue GPFS commands to check on the file systems.   A lot of mm commands will cause the file system to be internally mounted which can cause disk to be marked down if NSD servers have been shutdown.

    On 5.0.1 or later release, you can use the file system maintenance mode feature to cleanly shutdown a GPFS cluster.   I believe the feature is fully documented in 5.0.2 release but it is mostly functional in 5.0.1 release.

     

  • RobLogie
    RobLogie
    6 Posts

    Re: Cleanly shutting down a GPFS cluster

    ‏2019-01-08T03:14:28Z  
    • sxiao
    • ‏2019-01-02T18:12:47Z

    Make sure you don't have any scripts that issue GPFS commands to check on the file systems.   A lot of mm commands will cause the file system to be internally mounted which can cause disk to be marked down if NSD servers have been shutdown.

    On 5.0.1 or later release, you can use the file system maintenance mode feature to cleanly shutdown a GPFS cluster.   I believe the feature is fully documented in 5.0.2 release but it is mostly functional in 5.0.1 release.

     

    I have scripted some of the shutdown, and that includes waiting for the file system to be unmounted, including waiting for internal mounts.  I have noticed that sometimes you can wait a long time for the internal mounts to be resolved.

     

    As for 5.0.2 .. Glad you mentioned it .. I have just started a project with the customer to upgrade to that version .. and your information helps with more justification to do the upgrade .