Topic
  • 2 replies
  • Latest Post - ‏2013-05-06T13:43:12Z by tomcec
tomcec
tomcec
2 Posts

Pinned topic round robin on verbsports

‏2013-04-19T21:34:49Z | infiniband rdma verbs

I have a simple GPFS cluster with IB network. 1 NSD server, 2 clients

3.5.0-6

RHEL 6.2

 

Server has 2HCA dual port

Clients have 1HCA dual port

2 IB switches

 

This is the conf verbs wise:

 

[C1]
verbsPorts mlx4_0/1 mlx4_0/2
[C2]
verbsPorts mlx4_0/2 mlx4_0/1
[S1]
verbsPorts mlx4_1/1 mlx4_1/2 mlx4_3/1 mlx4_3/2
 

The question:

how does GPFS decide which ports to be used to establish connections between nodes?

For example which port C1 will use to connect to which port of S1?

I am asking because I have found that the order of the ports in the lists is relevant to have RDMA connections properly working

 

Thanks,

Tommaso

 

Updated on 2013-04-19T21:35:07Z at 2013-04-19T21:35:07Z by tomcec
  • dichung
    dichung
    25 Posts
    ACCEPTED ANSWER

    Re: round robin on verbsports

    ‏2013-05-06T13:21:57Z  

    The GPFS FAQ states we support one IB subnet.

    Assuming you have one IB subnet,

    > [C1]

    > verbsPorts mlx4_0/1 mlx4_0/2

    > [C2]

    > verbsPorts mlx4_0/2 mlx4_0/1

    > [S1]

    > verbsPorts mlx4_1/1 mlx4_1/2 mlx4_3/1 mlx4_3/2

    The RDMA connections are created as:

    C1 mlx4_0/1 <--> S1 mlx4_1/1 and mlx4_0/2 <--> S1 mlx4_1/2

    C2 mlx4_0/2 <--> S1 mlx4_1/1 and mlx4_0/1 <--> S1 mlx4_1/2

    For example, the C1 client port list "mlx4_0/1 mlx4_0/2" is sent to server S1 and the server connects them in the order of the server list "verbsPorts mlx4_1/1 mlx4_1/2 mlx4_3/1 mlx4_3/2". Since the client only sends two ports over, the server will only use the 1st two ports.

    Assuming single IB fabric and all IB ports can reach all other IB ports on a single IB subnet, the user may want to configure the client as:

    C1 mlx4_0/1 mlx4_0/2 mlx4_0/1 mlx4_0/2

    C2 mlx4_0/2 mlx4_0/1 mlx4_0/2 mlx4_0/1

    S1 mlx4_1/1 mlx4_1/2 mlx4_3/1 mlx4_3/2 (no change)

    This will allow a client to use all 4 server ports. There will be 4 RDMA connections per client.

     

  • dichung
    dichung
    25 Posts

    Re: round robin on verbsports

    ‏2013-05-06T13:21:57Z  

    The GPFS FAQ states we support one IB subnet.

    Assuming you have one IB subnet,

    > [C1]

    > verbsPorts mlx4_0/1 mlx4_0/2

    > [C2]

    > verbsPorts mlx4_0/2 mlx4_0/1

    > [S1]

    > verbsPorts mlx4_1/1 mlx4_1/2 mlx4_3/1 mlx4_3/2

    The RDMA connections are created as:

    C1 mlx4_0/1 <--> S1 mlx4_1/1 and mlx4_0/2 <--> S1 mlx4_1/2

    C2 mlx4_0/2 <--> S1 mlx4_1/1 and mlx4_0/1 <--> S1 mlx4_1/2

    For example, the C1 client port list "mlx4_0/1 mlx4_0/2" is sent to server S1 and the server connects them in the order of the server list "verbsPorts mlx4_1/1 mlx4_1/2 mlx4_3/1 mlx4_3/2". Since the client only sends two ports over, the server will only use the 1st two ports.

    Assuming single IB fabric and all IB ports can reach all other IB ports on a single IB subnet, the user may want to configure the client as:

    C1 mlx4_0/1 mlx4_0/2 mlx4_0/1 mlx4_0/2

    C2 mlx4_0/2 mlx4_0/1 mlx4_0/2 mlx4_0/1

    S1 mlx4_1/1 mlx4_1/2 mlx4_3/1 mlx4_3/2 (no change)

    This will allow a client to use all 4 server ports. There will be 4 RDMA connections per client.

     

  • tomcec
    tomcec
    2 Posts

    Re: round robin on verbsports

    ‏2013-05-06T13:43:12Z  
    • dichung
    • ‏2013-05-06T13:21:57Z

    The GPFS FAQ states we support one IB subnet.

    Assuming you have one IB subnet,

    > [C1]

    > verbsPorts mlx4_0/1 mlx4_0/2

    > [C2]

    > verbsPorts mlx4_0/2 mlx4_0/1

    > [S1]

    > verbsPorts mlx4_1/1 mlx4_1/2 mlx4_3/1 mlx4_3/2

    The RDMA connections are created as:

    C1 mlx4_0/1 <--> S1 mlx4_1/1 and mlx4_0/2 <--> S1 mlx4_1/2

    C2 mlx4_0/2 <--> S1 mlx4_1/1 and mlx4_0/1 <--> S1 mlx4_1/2

    For example, the C1 client port list "mlx4_0/1 mlx4_0/2" is sent to server S1 and the server connects them in the order of the server list "verbsPorts mlx4_1/1 mlx4_1/2 mlx4_3/1 mlx4_3/2". Since the client only sends two ports over, the server will only use the 1st two ports.

    Assuming single IB fabric and all IB ports can reach all other IB ports on a single IB subnet, the user may want to configure the client as:

    C1 mlx4_0/1 mlx4_0/2 mlx4_0/1 mlx4_0/2

    C2 mlx4_0/2 mlx4_0/1 mlx4_0/2 mlx4_0/1

    S1 mlx4_1/1 mlx4_1/2 mlx4_3/1 mlx4_3/2 (no change)

    This will allow a client to use all 4 server ports. There will be 4 RDMA connections per client.

     

    Thanks, good to know. I was asking right because I had 2 IB switches not interlinked. So specifying the HCA in the wrong

    order did not allow me to establish rdma communications between clients and server.

    From some of the tests I did I was also under the impression that repeating the same ports, as you suggest above, was not necessary because gpfs automatically did it to fully utilize all of the ports owned by the NSD servers. But I might be mistaken.

     

    Thanks again.

    Tommaso