We are planning to migrate an existing IB compute cluster to new hardware. The compute nodes will be migrated, along with the IB infrastructure.
We have four NSD servers to interface to storage and around 30 compute nodes connected via IB and GBit ethernet. The GPFS traffic is routed over IB (IPoIB, not RDMA, at this time).
The idea is to attach a second IB card into the NSD servers to connect to the new IB switch and compute nodes. The old and new IB network will not be interconnected. The new IB network will obtain a new, separate IP subnet. The IB subnets are not connected/routed to anything else. The GBit network does provide connectivity between all nodes.
As a result the data-carrying network (IB) will be split in two with the connection point being the NSD servers and two client groups who can not communicate.
Will that be a problem ?
If we decide to enable RDMA, will that cause trouble ?
We think we can run like that, provided we configure the admin and daemon names and subnets. Any caveats ?
Pinned topic Single cluster - two IB networks
Answered question This question has been answered.
Unanswered question This question has not been answered yet.
Updated on 2013-02-26T18:33:21Z at 2013-02-26T18:33:21Z by SystemAdmin
Re: Single cluster - two IB networks2013-02-26T16:34:55ZThis is the accepted answer. This is the accepted answer.Every node in a given cluster, server or client, must be able to communicate with every other node in the cluster. So there has to be a network that connects all nodes. What you can do is define the cluster over GigE, and then use "subnet" config parameters to designate the two IB subnets as preferred. Then for each connection, if the both nodes are on the same IB fabric, the IB subnet will be used; otherwise GigE will be used. You could also take the multi-cluster approach, but that would be an overkill for your situation.
Re: Single cluster - two IB networks2013-02-26T18:33:21ZThis is the accepted answer. This is the accepted answer.
- SystemAdmin 110000D4XK
All nodes will be able to communicate over GigE. This is covered.
The advanced admin guide actually has a picture with a config close to my situation (sc23-5182-06, page 3). But the guide presumes that you have multiple clusters. So multiple clusters is my fall-back plan. It's main disadvantage is the need for quorum nodes and server licenses in the client cluster.
There seems to be a presumption that a single cluster is uniform and all nodes can communicate over all interfaces. But the presumption is not explicit, therefore my question.