Installing IBM Spectrum Computing Suite for High Performance Analytics on a shared file system

Use the shared-directory installation to install components of IBM Spectrum Computing Suite for High Performance Analytics (HPA) on a shared file system. The shared directory installation enables you to update the LSF GUI hosts independently of the LSF master, server, and client hosts. You can also patch the contents of the shared directory from the deployer host.

Before you begin

See Requirements for IBM Spectrum Computing Suite for High Performance Analytics Suite installation for general prerequisites for installation.

You have already done the following steps, described in IBM Spectrum Computing Suite for HPA Suite installation overview:
  • Download the .bin package files for IBM Spectrum Computing Suite for HPA.
  • Run the .bin files to create the deployer host. This host contains the Ansible playbooks and repositories for installation.
  • Check host prerequisites and decide host roles.
Requirements:
  • The NFS or IBM Spectrum Scale file system must be mounted outside of the /opt/ibm directory. The installation automatically creates the required symbolic links to the /opt/ibm/lsfsuite directory.
  • All hosts listed in the lsf_inventory file must mount the shared directory. Installation fails on hosts that do not have this directory mounted.
  • The shared directory must allow write access to root on the deployer host.

About this task

In this task, the LSF server role hosts and LSF client hosts use the LSF binary files from a shared directory. The LSF master hosts and GUI hosts also use this shared directory to host the configuration files and work directories for High Availability. Using this configuration saves local disk space on the LSF server hosts and LSF clients, and enables you to patch the binary files for these roles from the deployer host.
IBM Spectrum Computing Suite for HPA Suite shared directory installation

The deployer host can be a separate host, as shown in the figure, or it can be a host that has a GUI or master host role. It is responsible for generating the contents of the shared directory. From its repositories, the installation extracts the necessary components and places them in the shared directory. The binary files are installed locally on the GUI hosts. The LSF Master, Servers, and Clients get binaries and configuration from the shared directory.

Components that are installed locally on LSF and LSF GUI hosts for better performance:
  • Elastic stack and dependencies (Elasticsearch Beats (filebeat-for-lsf, metricbeat-for-lsf, lsf-beat, gpfsio-collector-for-lsf), and logstash)
  • IBM Spectrum MPI
  • IBM Spectrum MPI
  • LSF GUI binaries are installed locally on GUI hosts

The following installation uses a deployer host to create the contents of the shared directory, then configures an LSF master host, a host with the GUI role and database host role, and several server hosts.

Procedure

  1. With the NFS or IBM Spectrum Scale file system mounted on all of the hosts you want to install, log in as root to the host you want to use as the deployer host and test access from the deployer host as root.
    For example, the following commands make sure that root can write to the /gpfs/lsf_suite/ directory.
    cd /gpfs/lsf_suite
    touch testfile
    ls -l testfile
    -rw-r--r--. 1 root root 0 Nov 1 2017 testfile
    
    The testfile should be owned by root.

  2. Change to the /opt/ibm/lsf_installer/playbook directory to customize the installation.

  3. Edit the lsf-inventory file to set up host roles for your cluster.
    LSF_Masters, GUI_Hosts, and DB_Host
    Specify the names of primary and secondary LSF master hosts in the LSF_Masters option. Specify the name of the secondary LSF master candidate host in the GUI_Hosts and DB_Host options.
    For example,
    [LSF_Masters]
    hosta
    hostb
    ...
    [GUI_Hosts]
    hostb
    ...
    [DB_Host]
    hostb
    
    LSF_Servers and LSF_Clients
    List your LSF server hosts in the LSF_Servers option, one host or host name expression per line. The expression in the following example configures four server hosts (hosta1, hosta2, hosta3, and hosta4):
    [LSF_Servers]
    hosta[1:4]
    
    LSF_Clients
    List some LSF client hosts in the LSF_Clients option for users to submit jobs from, one host or host name expression per line. These hosts do not run workloads. The expression in the following example configures three hosts (hostb1, hostb2, hostb3):
    [LSF_Clients]
    hostb[1:3]
    

  4. Test that the shared file system is mounted on all the hosts.
    For example, to test that the file /gpfs/lsf_suite/testfile is accessible from all the hosts in the cluster, run the following command from the /opt/ibm/lsf_installer/playbook directory on the deployer host:
    ansible all -i lsf-inventory -m command -a "ls /gpfs/lsf_suite/testfile"
    This command runs on all the hosts listed in the lsf-inventory file, and uses the command module to run the ls command as an argument.
  5. Edit the lsf-config.yml file to set the cluster name and other properties.
    Remember: Make sure you maintain indentation in the lsf-config.yml file.
    1. Set the cluster name.
      For example,
      LSF
        # Set my_cluster_name to the name of the cluster.
        my_cluster_name: cluster1
      ...
      
    2. Use the NFS_install_dir option to specify the shared directory that is mounted on all the hosts. The directory for the HA_shared_dir option is implied when the NFS_install_dir is set. For example:
      LSF
      ...
        HA_shared_dir: none
      ...
        NFS_install_dir: /gpfs/lsf_suite
      ...

  6. Run some pre-installation tests and correct any errors.
    1. Check the configuration file.
      ansible-playbook -i lsf-inventory lsf-config-test.yml
    2. Run the pre-deployment test.
      ansible-playbook -i lsf-inventory lsf-predeploy-test.yml

      This test runs on each host to check network connectivity and host name resolution, minimum disk space, and available memory. The test takes a few minutes to run.

    Correct any errors from the tests and run them again before you run the installation.

  7. If you want LSF_Masters to install LSF RPMs locally, move into group_vars/all and change the variable LSF_MASTERS_ON_LOCAL from the default value of false to true.
    Otherwise, the shared directory will be used by default.

  8. Perform the shared directory installation. Navigate to the /opt/ibm/lsf_installer/playbook directory on the deployer.
    1. Build the shared directory contents.
      From the deployer, run:
      ansible-playbook -i lsf-inventory lsf-nfs-setup.yml
    2. Deploy the cluster.
      From the deployer, run:
      ansible-playbook -i lsf-inventory lsf-deploy.yml
      Note: Using the '--limit' option allows new LSF_Servers to be installed and existing LSF_Servers can be reinstalled.
      ansible-playbook -i lsf-inventory --limit {Some Host or hosts} lsf-deploy.yml

      This playbook deploys the software on the LSF master and GUI hosts, sets up the necessary symbolic links to the shared directory, and installs the local component LSF server hosts that use the shared directory for the LSF binary files.

      When the installation is finished, it gives you a URL for the IBM Spectrum Computing Suite for HPA portal.

      For example,
      http://hostb.company.com:8080
      Where hostb is the name of the GUI host that you configured in the lsf-inventory file.

  9. Run some commands to verify the installation.
    1. Log out of the deployer host, and log in to a host in the cluster.
    2. Run the lsid command to see your cluster name and master host name.
      For example:
      lsid
      IBM Spectrum LSF 10.1.0.3, Sep 13 2017
      Suite Edition:  IBM Spectrum LSF Suite for WorkgroupsIBM Spectrum LSF Suite for HPCIBM Spectrum LSF Suite for Enterprise 10.2.0
      Copyright International Business Machines Corp, 1992-2017.
      US Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.
      
      My cluster name is cluster1
      My master name is hosta
      
    3. Run the lshosts command to see both LSF master hosts (they are members of the management group indicated by the mg resource) The four LSF server hosts and the three client hosts are also listed in this example.
      lshosts
      HOST_NAME      type    model    cpuf ncpus   maxmem  maxswp  server RESOURCES 
      hosta        X86_64  Intel_EM   60.0    16   255.8G    3.9G    Yes  (mg)
      hostb        X86_64  Intel_EM   60.0    16   255.8G    3.9G    Yes  (mg)
      hosta1       X86_64  Opteron8   60.0     1     7.9G      1G    Yes  ()
      hosta2       X86_64  Opteron8   60.0     1     7.9G      1G    Yes  ()
      hosta3       X86_64  Opteron8   60.0     1     7.9G      1G    Yes  ()
      hosta4       X86_64  Opteron8   60.0     1     7.9G      1G    Yes  ()
      hostb1       X86_64    PC6000  116.1     1    31.9G    3.9G     No  ()
      hostb2       X86_64    PC6000  116.1     1    31.9G    3.9G     No  ()
      hostb3       X86_64  Opteron8   60.0     1     7.9G      1G     No  ()
      
    4. Run the bhosts command to check that the status of each host is ok, and the cluster is ready to accept work.
      bhosts
      HOST_NAME          STATUS       JL/U    MAX  NJOBS    RUN  SSUSP  USUSP    RSV
      hosta1             ok              -      1     0       0      0      0      0
      hosta2             ok              -      1     0       0      0      0      0
      hosta3             ok              -      1     0       0      0      0      0
      hosta4             ok              -      1     0       0      0      0      0
      hosta              ok              -     16     0       0      0      0      0
      hostb              ok              -     16     0       0      0      0      0
      
    5. Log in to one of the server hosts to check that it's using the shared directory.
      For example,
      # ssh hosta1
      # cd /opt/ibm/lsf_suite
      # ls
      ext  lsf
      # ls -al 
      total 0
      drwxr-xr-x. 3 lsfadmin root 28 Nov   2 13:26 .
      drwxr-xr-x. 6 root     root 92 Nov   2 13:26 ..
      drwxr-xr-x. 2 lsfadmin root  6 Nov   2 13:26 ext
      lrwxrwxrwx. 1 root     root 39 Nov   2 13:26 lsf -> /gpfs/lsf_suite/lsf
      
      See that the lsf directory actually comes from the shared directory /gpfs/lsf_suite/lsf.

  10. Log in to the GUI as the lsfadmin user.
    Note: If the lsfadmin user was created by the installation and did not exist in your system, you might need to create a password for lsfadmin with the passwd command.
    1. Open your browser and enter the GUI portal URL from the installation.
    2. Log in with the lsfadmin user or any other user that you know exists.
    On the Resources > Dashboard page, you can see the six hosts that you deployed to.

What to do next

When you install IBM Spectrum Computing Suite for HPA for the first time, HTTPS is enabled by default. Additional configuration steps are required for High Availability. For the detailed steps, see https://www.ibm.com/support/knowledgecenter/SSZRJV_10.2.0/install_guide/https_enable_ha.html. To configure the Energy Data Collector, see Configuring Energy Data Collector for IBM Spectrum Computing Suite for High Performance Analytics.