I have to install a new OS on a NSD server in my cluster, so I have to scratch it and do a fresh installation.
All its disks are also served by three other servers, so I'm hoping to do that without unmounting the file system.
Unfortunately I cannot remove it from the disk servers with mmchnsd, because it doesn't work if the file system is mounted, thus I cannot use mmdelnode/mmaddnode.
I was planning to save the ssh keys, scratch and reinstall the node and use mmsdrrestore to recover the gpfs configuration.
This works on regular nodes, but I'm not sure that it's fine for an NSD server too.
Do you see any trouble with this plan?
This topic has been locked.
2 replies Latest Post - 2013-01-09T21:10:48Z by SystemAdmin
Pinned topic Reinstallation of a NSD server
Answered question This question has been answered.
Unanswered question This question has not been answered yet.
Updated on 2013-01-09T21:10:48Z at 2013-01-09T21:10:48Z by SystemAdmin
Re: Reinstallation of a NSD server2013-01-09T19:21:11Z in response to SystemAdminAs far as GPFS proper goes, the procedure looks OK. However, you want to be careful about the OS installer. We've seen quite a few cases of OS install scripts treating all visible disks as fair game for reformatting, and scribbling over GPFS NSDs (it's not clear whether human input was a factor in each specific case). So it would be a sound precaution to change zoning/LUN mapping/cabling so that GPFS LUNs are not visible on that node during the OS install.
Re: Reinstallation of a NSD server2013-01-09T21:10:48Z in response to SystemAdminI did architect the complete reinstall of several a 50 node production clusters that way. You need to restore ssh after the reinstall (keys in /etc/ssh and /root/.ssh) after the reinstall and can reintegrate the node into the cluster with mmsdrrestore and the mmsdrfs file from the primary node. We have failover at the application level (SAP BWA) and were running production on the backup nodes while upgrading the primaries (and vice versa).
We did unmap the GPFS LUNs on the storage unit, so they were unavailable to the nodes being upgraded.
18 months earlier we found that the standard SuSE installer, when there is a problem with the 1st (boot) lun, happily recovers by installing over the next available LUN. This was a GPFS LUN and we lost the filesystem. Restoring took almost a week. The customer was not happy.
So I would say unmapping (or unplugging) the GPFS LUNs is a critical safety precaution.