Topic
IC4NOTICE: developerWorks Community will be offline May 29-30, 2015 while we upgrade to the latest version of IBM Connections. For more information, read our upgrade FAQ.
2 replies Latest Post - ‏2013-01-09T21:10:48Z by SystemAdmin
SystemAdmin
SystemAdmin
2092 Posts
ACCEPTED ANSWER

Pinned topic Reinstallation of a NSD server

‏2013-01-09T13:34:33Z |
I have to install a new OS on a NSD server in my cluster, so I have to scratch it and do a fresh installation.
All its disks are also served by three other servers, so I'm hoping to do that without unmounting the file system.
Unfortunately I cannot remove it from the disk servers with mmchnsd, because it doesn't work if the file system is mounted, thus I cannot use mmdelnode/mmaddnode.
I was planning to save the ssh keys, scratch and reinstall the node and use mmsdrrestore to recover the gpfs configuration.
This works on regular nodes, but I'm not sure that it's fine for an NSD server too.
Do you see any trouble with this plan?

Thanks,
David
Updated on 2013-01-09T21:10:48Z at 2013-01-09T21:10:48Z by SystemAdmin
  • SystemAdmin
    SystemAdmin
    2092 Posts
    ACCEPTED ANSWER

    Re: Reinstallation of a NSD server

    ‏2013-01-09T19:21:11Z  in response to SystemAdmin
    As far as GPFS proper goes, the procedure looks OK. However, you want to be careful about the OS installer. We've seen quite a few cases of OS install scripts treating all visible disks as fair game for reformatting, and scribbling over GPFS NSDs (it's not clear whether human input was a factor in each specific case). So it would be a sound precaution to change zoning/LUN mapping/cabling so that GPFS LUNs are not visible on that node during the OS install.

    yuri
    • SystemAdmin
      SystemAdmin
      2092 Posts
      ACCEPTED ANSWER

      Re: Reinstallation of a NSD server

      ‏2013-01-09T21:10:48Z  in response to SystemAdmin
      I did architect the complete reinstall of several a 50 node production clusters that way. You need to restore ssh after the reinstall (keys in /etc/ssh and /root/.ssh) after the reinstall and can reintegrate the node into the cluster with mmsdrrestore and the mmsdrfs file from the primary node. We have failover at the application level (SAP BWA) and were running production on the backup nodes while upgrading the primaries (and vice versa).

      We did unmap the GPFS LUNs on the storage unit, so they were unavailable to the nodes being upgraded.

      18 months earlier we found that the standard SuSE installer, when there is a problem with the 1st (boot) lun, happily recovers by installing over the next available LUN. This was a GPFS LUN and we lost the filesystem. Restoring took almost a week. The customer was not happy.

      So I would say unmapping (or unplugging) the GPFS LUNs is a critical safety precaution.

      Markus