Deploy HDP or IBM Spectrum Scale service on pre-existing IBM Spectrum Scale file system

If you have one of the following configurations, you can deploy HDP with the IBM Spectrum® Scale service:

  1. Pre-existing IBM Spectrum Scale cluster in FPO configuration.
  2. Pre-existing IBM Spectrum Scale cluster with remote mount file system configuration.
  3. Pre-existing IBM Spectrum Scale cluster in which the GPFS client nodes belongs to an ESS-based IBM Storage® Scale cluster.

The steps for deployment are as follows:

  1. A quorum node on the pre-existing must be selected as the IBM Spectrum Scale Master node.
  2. Ensure that IBM Spectrum Scale is active and mounted.
    [root@c902f09x13 ~]# mmgetstate -a
    
    Node number Node name GPFS state
    -------------------------------------------
    1 c902f09x13 active
    2 c902f09x16 active
    3 c902f09x14 active
    4 c902f09x15 active
    [root@c902f09x13 ~]# mmlsmount bigpfs -L
    
    File system bigpfs is mounted on 3 nodes:
    192.0.2.0 c902f09x13
    192.0.2.1 c902f09x14
    192.0.2.2 c902f09x15
    192.0.2.3 c902f09x16
    [root@c902f09x13 ~]#
    Note: Ensure that the FPO or the local Hadoop IBM Spectrum Scale cluster is set to automount on reboot by running the following command:
    /usr/lpp/mmfs/bin/mmchfs <filesystem name> -A yes
  3. Follow Create HDP cluster.
  4. Follow Install Mpack package.
  5. Follow the Deploy the IBM Spectrum Scale service, with the following deviations:
    • If you have not started the IBM Spectrum Scale cluster on the Ambari Assign Slaves and Clients page, click the Previous button to go back to Assign Master page in Ambari. Then start the IBM Spectrum Scale cluster, and mount the file system onto all the nodes. Go back to the Ambari GUI to continue to the Assign Slaves and Client page.
    • Verify that the gpfs.storage.type is set to
      • local for FPO
      • shared for Single cluster with all Hadoop nodes as IBM Spectrum Scale nodes
      • remote for Remote mount with all Hadoop nodes as IBM Spectrum Scale nodes
    • Verify the yarn.nodemanager.local-dirs and yarn.nodemanager.log-dirs values are set to an available mounted local partitioned directories that already exist in your file system.

      For example:

      Mounted local partitioned directories - /opt/mapred/local<NUM>

      yarn.nodemanager.local-dirs=/opt/mapred/local1/yarn, /opt/mapred/local2/yarn, /opt/mapred/local3/yarn

      yarn.nodemanager.log-dirs=/opt/mapred/local1/yarn/logs, /opt/mapred/local2/yarn/logs, /opt/mapred/local3/yarn/logs

    • Do not set the GPFS NSD stanza file field.
      For FPO, the IBM Spectrum Scale NSD stanza file is not required because the file system already exists. Because Ambari does not allow a blank value, leave the default value of IBM Spectrum Scale NSD stanza file.
      Note: If you accidentally place a value in the GPFS NSD stanza file field which was originally blank, and then try to remove it, you must leave in a “blank” character for Ambari to proceed.
    • Single Scale cluster configuration

      For gpfs.storage.type=shared the local cluster hosts with GPFS components (GPFS_Master or GPFS_Node) selected in the UI, are added on to the ESS/Shared IBM Spectrum Scale cluster.

      • Setting gpfs.storage.type=shared for Shared storage means this will create a single scale cluster configuration.
      • Setting gpfs.storage.type=shared for ESS and creating the /var/lib/ambari-server/resources/shared_gpfs_node.cfg file on the Ambari server will create a single scale cluster configuration. The file must contain only one FQDN of a node in the shared management host cluster, and password-less SSH must be configured from the Ambari server to this node. Ambari uses this one node to join the GNR/ESS cluster. Ensure that the file has at least 444 permission.
      • [Optional] To create local cache disks, see Deploy the IBM Storage Scale service>Customize Services>Create Hadoop local cache disks section.
        Note: If you are not using shared storage, you do not need this configuration, and you can leave this local cache disk parameter unchanged in the Ambari GUI.
        • Verify the following fields have the correct information that match your preinstalled IBM Spectrum Scale file system (GPFS) cluster.
          • GPFS cluster name
          • GPFS quorum nodes
          • GPFS File System Name
          • gpfs.mnt.dir
          • gpfs.storage.type