I face the following problem:
HPC environment, GPFS cluster of NSD servers and several GPFS client clusters, defined in a private IP network (10.x.x.x, lets call it IP1) running IPoIB on native IB
Now, a new external GPFS cluster should mount the FS from the NSD server cluster.
That external cluster runs also IPoIB internally but its IB fabric is not linked to the IB fabric of the existing cluster. It has been planned to use BridgeX and VNICs on the NSD servers and the new remote cluster nodes for communication.
One would, according to textbooks, have to switch all daemon interfaces to the VNIC addresses. That does however not work: first, the existing client and NSD clusters have VNICs defined for other purposes already and use several different IP subnets for them with no routes interlinking them. Second, the customer is not willing to allow such redefinition of gpfs daemon interfaces.
Thus we need to use the current IPoIB address subnet on the old cluster (IP1) as the "public" subnet and make sure the new remote cluster can reach the NSD cluster on those addresses. For that we plan to set up an IP routing tunnel using openVPN (or something similar, in principle one just needs tunnel devices in the two tunnel ports and the proper routing) between the IP1 network and the IPoIB network of the new remote cluster (IP2), which can transport the layer 3 content.
However, such a tunnel will for sure become as bottleneck as soon as the remote cluster tries to access data. For that we would like to resort to the VNIC network (the NSD servers and the new remote cluster will have VNICs with routing between them, or even in the same IP subnet, lets call this IP3 .
And here arise some more problems with GPFS:
1. GPFS will, if subnet definitions are given, deny initial contact via private network addresses (got that from https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General+Parallel+File+System+%28GPFS%29/page/GPFS+Network+Communication+Overview, citing: "When using multiple networks in a GPFS cluster the primary cluster IP address (the address displayed when running the mmlscluster command) must not be a private IP address."). IP1 is a private network. This is for sure something which could be changed in code, but that would be a dirty move.
2. We want to let the data transfer between just the new remote cluster and the NSD servers run via their VNIC interfaces (IP3, IP4), but the NSD servers to the existing (old) clients and amongst themselves should continue to use the IP1 (IPoIB) network. The documentation (advanced admin guide) does present the (usual) case where all the clusters seeing the "private" network (here: IP3) make use of it. Likewise should the new remote cluster (REM_cl) continue to talk via its IPoIB net (IP2) internally and just use IP3 for getting/sending data from/to the NSD cluster (NSD_cl)
Just from theory I'd try the following config in GPFS:
- On new remote cluster:
mmremotecluster add NSD_cl -n <IP1_addr_of_NSD_cl>
- On (existing) NSD cluster:
mmauth grant REM_cl
The text under the link given says : "For private IP addresses GPFS assumes that two IP addresses are on the same subnet only if the two nodes are within the same cluster, or if the other node is in one of the clusters explicitly listed in the subnets configuration parameter." We use private addresses throughout, and we do have subnets defined for the respective other cluster. I do also understand from that statement that we cannot use different subnets on the VNICs of the NSD_cl and the REM_cl even when routed. For that matter: if a node registers with the cl mgr, does it just send its subnet list according to mmlsconfig, or does it announce all subnets it has interfaces in (independent on GPFS?).
If that works indeed, then we are still facing problem 1, GPFS refusing initial contact on private networks if multiple networks are defined (at least according to what I read) ... How could that be overcome?