Topic
6 replies Latest Post - ‏2013-12-04T12:02:29Z by kostty
kostty
kostty
6 Posts
ACCEPTED ANSWER

Pinned topic GPFS remote mount or partitioned cluster

‏2013-12-02T21:06:27Z |

Hi all,

I ask for your expertise on the issue of GPFS cluster expanding.

We have an existing GPFS cluster: 2 NSD servers, 50+ clients, all sitting in a single IB private network.

Now we've got some more client-nodes with their own IB switch and we need to add all of them to the existing GPFS cluster. For some configurational reason we can't simply connect the new IB switch to the existing one. So we have two independent interconnect fabrics and 2 NSD-servers.


 

The IB fabric is the only network available to the new nodes. They don't have Ethernet ports.

So what we did is that we put one IB interface of each NSD-server to one IB fabric, the other port of each NSD-server connecting to another fabric. So nodes from different groups can't see each other, while NSD can access both groups of nodes.


 

A small picture to make things a little bit clear:

                               +----+
        +------------+ <------+|NSD1|+---> +-------------+
        |  IB #1     |         +----+      | IB #2       |
        |  w/nodes1  |         +----+      | w/ nodes2   |
        +------------+ <------+|NSD2|+---> +-------------+
                               +----+

 

 What are possible ways to have these two isolated from each other groups of clients to share the same filesystem?


 

As we can see, they are the following:

a) add new nodes to the cluster

b) make a new cluster from the new ones and setup remote access for the filesystem in the first cluster.


 

a) When we simply add new nodes, all works perfectly until the new node mounts the FS. Then it throws up an error that it couldn't get configuration file from the manager-node. The passwordless ssh access is there, it's been checked 100 times. Scp works as well.

We even got as far as to make mmauth genkey new, propagate and mmauth update AUTHONLY as by some forum's thread.


 

admin mode set to central by default.

So we couldn't get any further with plan A. If anyone suggests what should be checked or done, I'd greatly appreciate that.


 

b) Almost the same story: a new cluster built. mmauth and mmremotecluster stuff all set up as in infocenter's guide.

On mounting remote FS it says:

Mon Dec  2 14:26:49.615 2013: mmfsd initializing. {Version: 3.5.0.11   Built: Jun 25 2013 17:03:02} ...

Mon Dec  2 14:26:49.907 2013: OpenSSL library loaded and initialized.

Mon Dec  2 14:26:51.324 2013: Node 172.19.11.1 (server0) is now the Group Leader.

Mon Dec  2 14:26:51.325 2013: This node (172.19.11.1 (server0)) is now Cluster Manager for cluster2

Mon Dec  2 14:26:51.398 2013: mmfsd ready

Mon Dec  2 14:26:51 GMT 2013: mmcommon mmfsup invoked. Parameters: 172.10.10.1 172.10.10.1 all

Mon Dec  2 14:27:30.295 2013: Waiting to join remote cluster cluster1

Mon Dec  2 14:27:31.296 2013: Connecting to 172.19.11.79 gpfs2serv <c1p2>

Mon Dec  2 14:27:31.297 2013: Remote host 172.19.11.79 gpfs2serv <c1p2> refused connection because it is not configured to accept connections from the IP address 172.10.10.1.  Ensure the local nodes name service is returning the same IP address which the host is configured to use for GPFS communication.

Mon Dec  2 14:27:31.298 2013: Connecting to 172.10.10.80 gpfs1serv <c1p1>

Mon Dec  2 14:27:31.297 2013: Remote host 172.10.10.80 gpfs1serv <c1p1> refused c


 

Once again, what is there that I should probably check. Any ideas are appreciated, because it seems to be a very small thing to solve to make this all work, that I could just miss.


The NSD-servers do have two different IPs on ib0 and ib1 which have the same name in both isolated node groups: ip_of_ib0 in group1 reverse-resolves to same name as ip_of_ib1 does in group2, all done through /etc/hosts files on the nodes.

If I've missed something, please ask. Thanks!

  • dlmcnabb
    dlmcnabb
    1012 Posts
    ACCEPTED ANSWER

    Re: GPFS remote mount or partitioned cluster

    ‏2013-12-02T21:22:38Z  in response to kostty

    The GPFS requirement is that all nodes in a remote cluster MUST be able to connect to every node in the main filesystem owning cluster and vice versa. You have only made this possible on the NSD servers, but all the manager/quorum/client nodes in the main cluster must also be able to initiate connections in either direction. So either you get ethernet connections for the new remote cluster and define the nodes in that cluster using the ETH adapters (use the subnets configuration setting to specify the local IB network for connections within the remote cluster), or you provide routing nodes in your site so that all the new remote nodes and the main cluster can find each other.

    Two remote clusters never have to have connectivity.

    • kostty
      kostty
      6 Posts
      ACCEPTED ANSWER

      Re: GPFS remote mount or partitioned cluster

      ‏2013-12-02T21:36:40Z  in response to dlmcnabb

      Hi Daniel,

      thanks for the answer. We will consider your suggestion.

      If we have routing between all the nodes, isn't it then better to make just one single cluster of all the nodes? It's more appropriate considering server licenses needed in the case of a remote cluster. And we don't have any local FS in the remote cluster itself, it just mounts one from the original cluster.

      • dlmcnabb
        dlmcnabb
        1012 Posts
        ACCEPTED ANSWER

        Re: GPFS remote mount or partitioned cluster

        ‏2013-12-02T23:35:05Z  in response to kostty

        Your choice. If a cluster does not have any filesystems, then all it needs is to have a single quorum node. This node becomes Cluster Manager and all it does is keep track of who is in the cluster. It can even die and not cause problems. The client nodes are only interested in joining the clusters that have filesystems, so basically ignore the fact that they may or may not have quorum in the remote cluster that they belong to.

        • kostty
          kostty
          6 Posts
          ACCEPTED ANSWER

          Re: GPFS remote mount or partitioned cluster

          ‏2013-12-03T21:13:51Z  in response to dlmcnabb

          Daniel,

          we tried the suggested solution but there is still no success.

          The cluster gives:

          server0e:  mmremote: Run the command from an active terminal or enable global passwordless access.
          server0e:  mmremote: Unable to retrieve GPFS cluster files from node gpfs2serv
          server0e:  mmremote: Unexpected error from gpfsInit.  Return code: 1
           

          What we did was:

          1) we tried adding ETH-routing between existing cluster's client nodes' subnet and new clients. So they were not in the same subnet, but could ping/ssh each other through router.

          Both adding nodes to the first cluster or remote mounitng between two clusters won't work. It gave the same errors of "not being able to get configuration from NSD-nodes" in first case and "NSD-servers not being configured to accept remote connection" in the second case.

          2) we plugged new clients directly into ETH-subnet of existing cluster's clients so all the clients were on the same ETH subnet. But NSD-servers could only see clients through two isolated IB fabrics: one fabric for old clients and one for the new ones.

          This simple drawing could hopefully make it a little bit clearer:

                                         +----------------------------+
                                         |     ETH-switch             |
                                    +--->|                            |<--+
                                    |    +----------------------------+   |
                                    |                                     |
                                    |            ib0  +----+ ib1          |
                               +----+-------+ <------+|NSD1|+---> +-------+-----+
                               |  IB #1     |         +----+      | IB #2       |
                               |  w/nodes1  |    ib0  +----+ ib1  | w/ nodes2   |
                               +------------+ <------+|NSD2|+---> +-------------+
                                                      +----+

          We just got the output above.

          So new clients are defined like: ib_name:client:eth_name

          Old nodes still are defined like: ib_name:client:ib_name

          No luck.

          So can I get once again which requrements are crucial concerning internode connectivity?

          Have the nodes to be "able to connect" or "in the same subnet with the same n/w mask and so on"?

          Is a setup like above supported, having multiple subnets each of which doesn't include completely ALL cluster nodes and servers, but only a subset of those? And all subnets together DO provide means for every node to talk to each other: clients can use ETH and NSD-servers use 2 separate IB subnets.

          What is the requirements for SUBNETS PARAMETER in a single cluster? We tried different variants of it including only ETH, or only IB or all subnets, but no success. Is it really neccessary to restart all the cluster in order for all the nodes to reread SUBNETS in config?

          • dlmcnabb
            dlmcnabb
            1012 Posts
            ACCEPTED ANSWER

            Re: GPFS remote mount or partitioned cluster

            ‏2013-12-03T22:23:54Z  in response to kostty

            Make sure that the contact node names you specify in the remote clusters resolve to the same IP addresses as what the contact nodes resolve their node name as defined to the main cluster. The remote node sends over what IP address it thinks it should contact, and the receiving node matches that IP address with what it thinks it is in the cluster. If no match, then no connection.

            When defining multi-cluster it is imperative that all the nodes are defined to their own cluster using publicly available adapter names (usually ethernets) and as the contact nodes. Initial connections use these public names.

            After the initial connection is established, the nodes then share all their adapter IP addresses with each other, and the "subnets" setting lets the nodes reconnect on a higher speed connection if they share a common subnet (like a local IB fabric).

            If the clusters have private IP network addresses (192.168.*.* or 10.*.*.*) and those networks are really connected on the same fabric so that no IP addresses overlap, then you should list the cluster names that share a common fabric in the "subnets" setting. Otherwise GPFS will assume that private IP addresses may not be unique, and will not use what looks like a common subnet.

             

            • kostty
              kostty
              6 Posts
              ACCEPTED ANSWER

              Re: GPFS remote mount or partitioned cluster

              ‏2013-12-04T12:02:29Z  in response to dlmcnabb

              Satisfying the first requirement from the post above did the trick. So we ended up having one set of IP addresses for all the nodes, which can be resolved uniquely and reached everywhere in the cluster.

              In our case, we had to make a route between the new nodes and ib0 interfaces of NSD-servers, and not ib1. ib0's are those facing the old part of the cluster as per drawing in the post above.

              So basically that's it.

              Thank you dlmcnabb a lot for your assistance!