Activating HA for HDFS NameNode through virtual IP

Follow this procedure to activate HA for the HDFS NameNode in virtual IP mode.

Before you begin

Before activating the HA functionality in virtual IP mode, make sure of the following configuration:
  • The NameNode must join the IBM® Spectrum Symphony cluster as a management host; that is, it must be a part of the ManagementHosts (mg) group. To configure a host as a management host, use the egoconfig mghost shared_dir command. For details, refer to the Reference Guide.
  • All IBM Spectrum Symphony configuration data and NameNode metadata must be stored on a NFS shared file system, which is accessible to all primary-candidate hosts. To configure this location, set the dfs.name.dir property in $HADOOP_HOME/conf/hdfs-site.xml on the HDFS server and all standby primary hosts.
Note: Virtual IP mode is available only for the HDFS NameNode. The SecondaryNameNode must be on a static IP and must not run on either the primary NameNode or backup NameNode. With virtual IP configured for the NameNode, the static virtual IP for the NameNode daemon is automatically reassigned. The NameNode configuration for the SecondaryNode (set as dfs.secondary.http.address in $HADOOP_HOME/conf/hdfs-site.xml) is therefore not dynamically updated. Ensure the following configuration for the Secondary NameNode:
  • Configure a static IP address for the Secondary NameNode daemon.
  • Restrict the HA SecondaryService to run only on a specific host by defining only one host in the SecondaryNodeRG resource group in IBM Spectrum Symphony.

About this task

Follow these steps to configure HA for HDFS NameNode in virtual IP mode.

Procedure

  1. From the cluster management console, configure the NameNode (NameNodeRG), SecondaryNode (SecondaryNodeRG), and DataNode (DataNodeRG) resource groups.
    Note: By default, DataNodeRG shares slots with ComputeHosts on the same host. ComputeHosts have MapReduce compute slots (for example, slots equal to the number of CPUs) while DataNodeRG has only one overlapped slot to run the DataNode daemon. NameNode and SecondaryNode groups include only the primary host and management hosts. NameNode and SecondaryNode groups share metadata in the NFS shared file system.
    1. Start the cluster management console, which is available by default at http://host_name:8080/platform.
    2. Log in with your credentials.
    3. From the Dashboard's Common Tasks menu, click Resources > Resource Planning > Resource Groups.
    4. Click NameNodeRG from the list.
    5. Choose Static (List of Names) from the Resource Selection Method drop-down list.

      The page refreshes to display a list of possible hosts.

    6. Select the hosts that you want to add and click Apply.
    7. Repeat steps e and f for the SecondaryNodeRG and the DataNodeRG.
  2. Configure the NameNode service profile.
    1. From the cluster management console, go to Workload > EGO > Service Profiles.
    2. Click the NameNode service.

      The Service Profile editor opens.

    3. Locate the sc::ActivityDescription section.
    4. In the Actions drop-down list of the ego:ActivitySpecification parameter, click Insert "ego:EnvironmentVariable", set the name to SYM_HA_HDFS_VIRTUAL_IP and the value to the virtual IP you have chosen.
    5. In the Actions drop-down list of the ego:ActivitySpecification parameter, click Insert "ego:ExecutionUser" and set its value to that of the HDFS administrative OS user.
    6. In the ego:EnvironmentVariable parameters, add or modify the values for the following variables:
      • HADOOP_HOME: Set this value to $HADOOP_HOME.
      • HADOOP_CONF_DIR: Set this value to $HADOOP_CONF_DIR.
      • PMR_HDFS_PORT: Set this value to the HDFS port, which is by default 8020.
    7. Click Save and OK.
  3. Repeat step 2 for the SecondaryNode and DataNode service profiles.
    Note: Set SYM_HA_HDFS_VIRTUAL_IP in all the three HA services to the same static accessible virtual IP.
  4. (Optional) Add the following environment variables to customize the virtual IP network alias configuration:
    Environment variable Description Default
    SYM_HA_HDFS_BROADCAST Broadcast address for the virtual IP. x.y.z.255 for virtual IP x.y.z.q
    SYM_HA_HDFS_NETMASK Netmask for the virtual IP. 255.255.255.0
    SYM_HA_HDFS_ETH Ethernet device for the virtual IP. eth0
    SYM_HA_HDFS_ETH_ALIAS Ethernet Alias Index for the virtual IP. 0 - Ethernet alias is created as eth0:0