Installation prerequisites

Set up the basic IBM Storage Scale installation prerequisites before installing CES HDFS.

See the Installation prerequisites section in the IBM Storage Scale: Concepts, Planning, and Installation Guide for base Scale installation requirements.

  • NTP setup

    It is recommended that Network Time Protocol (NTP) must be configured on all the nodes in your system to ensure that the clocks of all the nodes are synchronized. Clocks that are not synchronized cause debugging issues and authentication problems with the protocols. Across all the HDFS Transparency and Hadoop nodes, follow the steps that are listed in Configure NTP to synchronize the clock in HDFS Transparency.

  • SSH and network setup
    Set up passwordless SSH as follows:
    • From the admin node to the other nodes in the cluster.
    • From protocol nodes to other nodes in the cluster.
    • From every protocol node to the rest of the protocol nodes in the cluster.
    • On fresh Red Hat® Enterprise Linux® 8 installations, you must create passwordless SSH keys by using the ssh-keygen -m PEM command.
  • CES public IP
    • A set of CES public IPs (or Export IPs) is required. These IPs are used to export data using the protocols. Export IPs are shared among all protocols and are organized in a public IP pool. See Adding export IPs section under Deploying protocols topic in the IBM Storage Scale: Concepts, Planning, and Installation Guide.
    • If you are using only the HDFS protocol, it is sufficient to have just one CES Public IP.
    • The CES IP/hostname used for CES HDFS must be resolved by the DNS service and not just by an entry in your /etc/hosts file. Otherwise, you might encounter errors when you add the Hadoop services.
      Note: This is a Java™ requirement.
  • ACL

    In general, the recommendation is to configure the file system to support NFSv4 ACLs. NFSv4 ACL is a requirement for ACL usage with the SMB and NFS protocols. However, ALL ACL is requirement for ACL usage with HDFS protocols. If the protocol node has multiple protocols, the final ACL setting after deployment should be set to -k ALL if you are using HDFS protocol.

    For more information, see examples under the mmchfs command topic in the IBM Storage Scale: Command and Programming Reference Guide.

  • Packages

    Corresponding kernel-header, kernel-devel, gcc, cpp, gcc-c++, instils, make must be installed.

    yum install kernel-devel cpp gcc gcc-c++ binutils make

    Note: If you are using CDP Private Cloud Base, you need to install Python 2.7 on Red Hat Enterprise Linux 8.0 nodes. By default, Python 3 might be installed on Red Hat Enterprise Linux 8.0 nodes. CDP Private Cloud Base with CES HDFS requires the nodes to have both Python 2.7 and Python 3.
  • UID/GID consistency value under IBM Storage Scale

    Ensure that all the user IDs and group IDs used in the IBM Storage Scale cluster for running jobs, accessing the IBM Storage Scale file system or for the Hadoop services must be created and have the same values across all the IBM Storage Scale nodes. This is required for IBM Storage Scale.

    You can also use the /usr/lpp/mmfs/hadoop/scripts/gpfs_create_hadoop_users_dirs.py script that is provided with HDFS Transparency 3.1.1-3 and later. Any users or groups that are created with this script are guaranteed to have consistent UID/GID across all the nodes.

  • Upgrading to version 3.2.2-stream is only possible from gpfs.hdfs-protocol version 3.2.2-5 or higher. Therefore, users must first upgrade to version 3.2.2-5 before proceeding to upgrade to any version from 3.2.2-6 onwards. For example: If users intend to upgrade from version 3.2.2-1 to 3.2.2-8, they must initially upgrade from 3.2.2-1 to 3.2.2-5, and subsequently proceed from 3.2.2-5 to 3.2.2-8.
  • To upgrade to version 3.1.1-stream, users must have gpfs.hdfs-protocol version 3.1.1-14 installed. Hence, it is necessary to first upgrade to version 3.1.1-14 before attempting to upgrade to any version from 3.1.1-15 onwards. For example: If users aim to upgrade from version 3.1.1-10 to 3.1.1-17, they should initially upgrade from 3.1.1-10 to 3.1.1-14, and then proceed from 3.1.1-14 to 3.1.1-17.
    For 3.1.1-x on all HDFS Transparency nodes, complete the following steps:
    1. If it does not exist, create the path /opt/hadoop/jars by using the command:
      $ mkdir -p /opt/hadoop/jars
    2. Download hadoop-3.1.4.tar.gz from Apache by issuing the following commands:
      $ cd /opt/hadoop/jars
      $ wget https://archive.apache.org/dist/hadoop/core/hadoop-3.1.4/hadoop-3.1.4.tar.gz
    3. Extract the content of the tar files by using this command:
      $ tar -xvf hadoop-3.1.4.tar.gz
    4. Download additional JAR files from the maven repository and save them in /opt/hadoop/jars.
      The additional JAR files that are needed are:
      • curator-client-2.12.0.jar
      • curator-framework-2.12.0.jar
      • curator-recipes-2.12.0.jar
      • guava-11.0.2.jar
      • hadoop-annotations-3.1.1.jar
      • hadoop-auth-3.1.1.jar
      • jsch-0.1.54.jar
      • jsr305-3.0.0.jar
      • xz-1.0.jar

      Alternatively, download hadoop-3.1.1.tar.gz from Apache and extract it in /opt/hadoop/jars.

    5. Proceed with the installation.
    To consult the installation steps for 3.2.2-6 onwards, see Upgrading.

The following sections are steps for installation with snips from the IBM Storage Scale installation documentation: