• 1 reply
  • Latest Post - ‏2013-04-30T14:18:00Z by ufa
161 Posts

Pinned topic use of dedicated subnet for just one remote cluster of many

‏2013-04-23T12:05:23Z | gpfs subnets


I face the following problem:

HPC environment, GPFS cluster of NSD servers and several GPFS client clusters, defined in a private IP network (10.x.x.x, lets call it IP1) running IPoIB on native IB

Now, a new external GPFS cluster should mount the FS from the NSD server cluster.

That external cluster runs also IPoIB internally but its IB fabric is not linked to the IB fabric of the existing cluster. It has been planned to use BridgeX and VNICs on the NSD servers and the new remote cluster nodes for communication.

One would, according to textbooks, have to switch all daemon interfaces to the VNIC addresses. That does however not work: first, the existing client and NSD clusters have VNICs defined for other purposes already and use several different IP subnets for them with no routes interlinking them. Second, the customer is not willing to allow such redefinition of gpfs daemon interfaces.

Thus we need to use the current IPoIB address subnet on the old cluster (IP1) as the "public" subnet and make sure the new remote cluster can reach the NSD cluster  on those addresses. For that we plan to set up an IP routing tunnel using openVPN (or something similar, in principle one just needs tunnel devices in the two tunnel ports and the proper routing) between the IP1 network and the IPoIB network of the new remote cluster (IP2), which can transport the layer 3 content.

However, such a tunnel will for sure become as bottleneck as soon as the remote cluster tries to access data. For that we would like to resort to the VNIC network (the NSD servers and the  new remote cluster will have VNICs with routing between them, or even in the same IP subnet, lets call this IP3 .

And here arise some more problems with GPFS:

1. GPFS will, if subnet definitions are given, deny initial contact via private network addresses (got that from!/wiki/General+Parallel+File+System+%28GPFS%29/page/GPFS+Network+Communication+Overview, citing: "When using multiple networks in a GPFS cluster the primary cluster IP address (the address displayed when running the mmlscluster command) must not be a private IP address."). IP1 is a private network. This is for sure something which could be changed in code, but that would be a dirty move.

2. We want to let the data transfer between just the new remote cluster and the NSD servers run via their VNIC interfaces (IP3, IP4), but the NSD servers  to the existing (old) clients and amongst themselves should continue to use the IP1 (IPoIB) network. The documentation (advanced admin guide) does present the (usual) case where all the clusters seeing the "private" network (here: IP3)  make use of it. Likewise should the new remote cluster (REM_cl)  continue to talk via its IPoIB net (IP2) internally and just use IP3 for getting/sending data from/to the NSD cluster (NSD_cl)

Just from theory I'd try the following config in GPFS:

  • On new remote cluster: 

mmremotecluster add NSD_cl -n <IP1_addr_of_NSD_cl>

mmchconfig 'subnets=IP3/NSD_cl'


  • On (existing) NSD cluster:

mmauth grant  REM_cl

mmchconfig 'subnets=IP3/REM_cl'

The text under the link given says : "For private IP addresses GPFS assumes that two IP addresses are on the same subnet only if the two nodes are within the same cluster, or if the other node is in one of the clusters explicitly listed in the subnets configuration parameter." We use private addresses throughout, and we do have subnets defined for the respective other cluster. I do also understand from that statement  that we cannot use different subnets on the VNICs of the NSD_cl and the REM_cl even when routed. For that matter: if a node registers with the cl mgr, does it just send its subnet list according to mmlsconfig, or does it announce all subnets it has interfaces in (independent on GPFS?).

If that works indeed, then we are still facing problem 1, GPFS refusing initial contact on private networks if multiple networks are defined (at least according to what I read) ... How could that be overcome?




Updated on 2013-04-23T12:07:38Z at 2013-04-23T12:07:38Z by ufa
  • ufa
    161 Posts

    Re: use of dedicated subnet for just one remote cluster of many


    Did some tests using virtual machines on my Notebook machine.

    Looks like my concerns can be ruled out:


    1. Daemon interfaces of REM_cl in, of NSD_cl and CLI_cl in REM_cl has no interface in, but routing is in place so it can reach NSD_cl. REM_cl can register with NSD_cl, and so can CLI_cl.

    2. Using the following subnets, I can achieve the intended state that NSD_cl talks just to REM_cl on but internally, and to CLI_cl, it remains on

    NSD_cl:  subnets

    REM_cl: subnets

    CLI_cl: subnets





    Updated on 2013-04-30T14:19:06Z at 2013-04-30T14:19:06Z by ufa