Limitations and information

Known information, limitations and workarounds for IBM Spectrum Scale and HDFS Transparency integration are stated in this section.

General
  • The IBM Spectrum® Scale service does not support the rolling upgrade of IBM Spectrum Scale and Transparency from the Ambari GUI.
  • The rolling upgrade of Hortonworks HDP cluster is not supported if the IBM Spectrum Scale service is still integrated.
  • The minimum recommended version for IBM Spectrum Scale is 4.1 and above. HDFS Transparency is not dependent on the version of IBM Spectrum Scale.
  • Manual Kerberos setup requires Kerberos setting in Ambari to be disabled before deploying IBM Spectrum Scale mpack. If IBM Spectrum Scale service is already installed, the HDFS Transparency requires to be unintegrated before enabling Kerberos in Ambari.
  • Federation Support

    Federation is supported for open source Apache Hadoop stack. The HDFS Transparency connector supports two or more IBM Spectrum Scale file systems to act as one uniform file system for Hadoop applications. For more information, see Overview.

  • The latest JDK supported version for Ambari is 1.8.0.77.
  • Ambari is required to be restarted as root in a non-root environment, to avoid exceptions.
  • All configuration changes must be made through the Ambari GUI, and not manually set into the HDFS configuration files or into the HDFS Transparency configuration files. This is to ensure that the configuration changes are propagated properly.
  • In your existing cluster, if the HDFS settings in the HDFS Transparency configuration files were manually changed (For example: settings in core-site, hdfs-site, or log4j.properties in /var/mmfs/hadoop/etc/hadoop) and these changes were not implemented in the existing native HDFS configuration files, during the deployment of Ambari IOP or HDP and IBM Spectrum Scale service, the HDFS Transparency configuration is replaced by the Ambari UI HDFS configurations. Therefore, save changes that are set for the HDFS Transparency configuration files so that these values can later be applied through the Ambari GUI.
  • For FPO systems, ensure that you follow the proper steps to stop/start IBM Spectrum Scale. Otherwise, restarting the IBM Spectrum Scale NSD might not be possible after NSDs go down and auto recovery fails. This can occur when doing STOP ALL/START ALL from Ambari which stops IBM Spectrum Scale without properly handling the NSDs in FPO mode for Mpack 2.4.2.6 and earlier and for Mpack 2.7.0.0. For more information, see IBM Spectrum Scale NSD are not able to be recovered in FPO clusters (Stop/Start of Scale service via Ambari GUI).
Installation
  • Ambari only supports the creation of IBM Spectrum Scale FPO file system.
  • While creating an Ambari IOP or HDP cluster, you do not need to create a local partition file system to be used for HDFS if you plan to install IBM Spectrum Scale FPO through Ambari. IBM Spectrum Scale Ambari management pack will create the recommended partitions for the local temp disks and IBM Spectrum Scale disks. The local temp disks are mounted and used for the Yarn local directories.
  • If disks are partitioned before creating the IBM Spectrum Scale FPO through Ambari, the standard NSD is required to be used.
  • Ensure that the GPFS Master and the Ambari server are colocated. The Ambari server must be part of the Ambari and GPFS cluster. This implies that the Ambari server host is defined as an Ambari agent host in the Add Hosts UI panel while setting up the Hadoop cluster. Otherwise, IBM Spectrum Scale service fails to install if the nodes are not colocated.
  • If you need to deploy the IOP or HDP over an existing IBM Spectrum Scale FPO cluster, either store the Yarn’s intermediate data into the IBM Spectrum Scale file system, or use idle disks formatted as a local file system. It is recommended to use the latter method. If a new IBM Spectrum Scale cluster is created through the Ambari deployment, all the Yarn’s NodeManager nodes should be FPO nodes with the same number of disks for each node specified in the NSD stanza.
  • If you are deploying Ambari HDP on top of an existing IBM Spectrum Scale and HDFS Transparency cluster:
    • Perform a backup of the existing HDFS and HDFS Transparency configuration before proceeding to deploy Ambari IOP or HDP, or deploy the IBM Spectrum Scale service with Ambari on a system that has HDFS Transparency installed on it.
    • Ensure that the HDFS configuration provided through the Ambari UI is consistent with the existing HDFS configuration.
      • The existing HDFS NameNode and DataNode values must match the Ambari HDFS UI NameNode and DataNode values. Otherwise, the existing HDFS configuration will be overwritten by the default Ambari UI HDFS parameters after the Add Service Wizard completes.
      • The HDFS DataNodes being assigned in the Assign Slaves and Clients page in Ambari must contain the existing HDFS Transparency DataNodes. If the host did not have HDFS DataNode and GPFS Node set in Ambari, data on that host is not accessible, and cluster might be under replicated. If the node was not configured as an HDFS DataNode and GPFS node during the Assign Slaves and Clients, the host can add those components through the HOSTS component panel to resolve those issues. For more information, see Adding GPFS node component.
    • The HDFS NameNodes specified in the Ambari GUI during configuration must match the existing HDFS Transparency NameNodes.
    • Verify that the host names that are used are the data network addresses that IBM Spectrum Scale uses for its cluster setup. Otherwise in an existing or shared file system, the IBM Spectrum Scale service fails during installation because of a wrong host name.
    • While deploying HDP over an existing IBM Spectrum Scale file system, the IBM Spectrum Scale cluster must be started, and the file system must be mounted on all the nodes before starting the Ambari deployment.
  • When deploying the Ambari IOP or HDP cluster, ensure there are no mount points in the cluster. Otherwise, the Ambari will take the shared mount point directory as the directory for the open source services. This will cause the different nodes to write to the same directory.
  • Ensure that all the hosts for the IBM Spectrum Scale cluster contain the same domain name while creating the cluster through Ambari.
  • IBM Spectrum Scale service requires that all the NameNodes and DataNodes are GPFS nodes.
  • The IBM Spectrum Scale Ambari management pack uses the manual installation method and not the IBM Spectrum Scale installation toolkit.
  • If installing a new FPO cluster through Ambari, Ambari creates the IBM Spectrum Scale with the recommended settings for FPO, and builds the GPFS portability layer on each node.
  • It is recommended to assign HDFS Transparency NameNode running over GPFS node with metadata disks.
  • It is recommended to assign Yarn ResourceManager node to be running HDFS Transparency NameNode.
  • When you are deploying the IBM Spectrum Scale service, the gpfs.replica.enforced parameter might appear as dfs in the Ambari Scale service GUI, even though HDFS Transparency (3.1.0.5 and later) sets it to gpfs by default. Therefore, it is important to set the gpfs.replica.enforced parameter value to gpfs in Ambari. Otherwise, HDFS Transparency will use dfs as the value for the gpfs.replica.enforced parameter instead of gpfs.

    Update the gpfs.replica.enforced parameter to gpfs in the service wizard and proceed with the deployment.

Configuration
  • After adding and removing nodes from Ambari, some aspects of the IBM Spectrum Scale configuration, such as page pool, as seen by running the mmlsconfig command, are not refreshed until after the next restart of the IBM Spectrum Scale Ambari service. However, this does not impact the functionality.
  • Short circuit is disabled when IBM Spectrum Scale service is installed. For information on how to enable or disable Short Circuit, see Short Circuit Read Configuration.
Ambari GUI
  • If any GPFS node other than the GPFS Master is stopped, the IBM Spectrum Scale panel does not display any alert.
  • The NFS gateway is displayed on the HDFS dashboard but is not used by HDFS Transparency. NFS gateway is not supported. Use IBM Spectrum Scale protocol for better scaling if your application requires NFS interface.
  • The IBM Spectrum Scale Service UI Panel > Actions > Collect_Snap_Data does not work if you configure an optional argument file (/var/lib/ambari-server/resources/gpfs.snap.args).
  • For IBM Spectrum Scale GUI quick link, it is required to initialize the IBM Spectrum Scale management GUI before accessing through Ambari quick links. See IBM Spectrum Scale management GUI.
Node management
  • Ambari adds nodes and installs the IBM Spectrum Scale software on the existing IBM Spectrum Scale cluster, but does not create or add NSDs to the existing file system.
  • Adding a node in Ambari fails if the node to be added does not have the same IBM Spectrum Scale version or the same HDFS Transparency version as the version currently installed on the Ambari IBM Spectrum Scale HDFS Transparency cluster. Ensure that the node to be added is at the same IBM Spectrum Scale level as the existing cluster.
  • Decommissioning a DataNode is not supported when IBM Spectrum Scale is integrated.
  • Moving a NameNode from the Ambari HDFS UI when HDFS Transparency is integrated is not supported. To manually move the NameNode, see Moving a NameNode.
  • New key value pairs added to the IBM Spectrum Scale Ambari management pack GUI Advance configuration Custom Add Property panel are not effective in the IBM Spectrum Scale file system. Therefore, any values not seen in the Standard or Advanced configuration panel will need to be set manually on the command line using the IBM Spectrum Scale /usr/lpp/mmfs/bin/mmchconfig command.
IBM Spectrum Scale
  • Ensure that bi-directional password-less SSH is set up between all GPFS Nodes.
  • The Hadoop services IDs and groups are required to have the same values across the cluster. Any user name needs a user ID in the OS or active directory service when writing to the file system. This is required for IBM Spectrum Scale.
    • If you are using LDAP/AD: Create the IDs and groups on the LDAP server, and ensure that all nodes can authenticate the users.
    • If you are using local IDs: The IDs must be the same on all nodes with the same ID and group values across the nodes.
  • IBM Spectrum Scale only supports installation through a local repository.
  • The management pack does not support IBM Spectrum Scale protocol and Transparent Cloud Tiering (TCT) packages.
  • Ensure that in an HDFS Transparency environment, the IBM Spectrum Scale file system is set to permit any supported POSIX ACL types. Issue mmlsfs <Device> -k to ensure the -k value is set to all.
HDP
  • The Manage JournalNodes is shown in HDFS > Actions submenu. This function should not be used when IBM Spectrum Scale service is deployed.
  • The + is not supported when using hftp://namenode:50070.