Configuring the Docker instance and HDFS transparency

  1. Docker (version 1.9+) requires Redhat7+. Modify the Redhat Yum Repos to upgrade the selinux-policy and device-mapper-libs by running the following commands:
    • yum upgrade selinux-policy
    • yum upgrade device-mapper-libs
  2. To install Docker engine (version 1.9+), See link.
  3. Configure the network bridge adapter on physical machines. There can be only one network bridge adapter on one machine.
    Note: These configurations must be changed under /etc/sysconfig/network-scripts/:
    [root@c3m3n04 network-scripts]# cat ifcfg-br0
    DEVICE=br0
    TYPE=Bridge
    BOOTRPOTO=static
    IPADDR=172.17.0.1
    NETMASK=255.255.255.0
    ONBOOT=yes
    
    [root@c3m3n04 network-scripts]# cat ifcfg-enp11s0f0
    # Generated by dracut initrd
    DEVICE="enp11s0f0"
    ONBOOT=yes
    NETBOOT=yes
    UUID="ca481ab0-4cdf-482e-b5d3-82be13a7621c"
    IPV6INIT=yes
    BOOTPROTO=static
    HWADDR="e4:1f:13:be:5c:28"
    TYPE=Ethernet
    NAME="enp11s0f0"
    IPADDR=192.168.3.2
    BROADCAST=192.168.255.255
    NETMASK=255.255.255.0
    Note: You must modify the IPADDR, BROADCAST, and NETMASK according to your network configuration.

    In this example, the br0 bridge adapter is bundled with the enp11s0f0 physical adapter. You must modify the code in the example for all the physical machines on which the Docker instances must be run.

  4. Modify the Docker service script and start the Docker engine daemons on each node:
    vim /usr/lib/systemd/system/docker.service
    ExecStart=/usr/bin/docker daemon -b br0 -H fd://
    
    service docker stop 
    service docker start
  5. Configure the network route table on each machine:
    route add -net 172.17.1.0/24 gw <replace-physical-node-ip-here> dev enp11s0f0

    where <replace-physical-node-ip-here> is the IP address of your machine.

  6. The IP addresses of the nodes must be different so that the Docker instances from one physical node can access the Docker instances in another physical node. Check if you can connect to the br0 IP address from another node.
  7. Configure HDFS transparency and start the HDFS transparency services. Modify /usr/lpp/mmfs/hadoop/etc/hadoop/core-site.xlm and /usr/lpp/mmfs/hadoop/etc/hadoop/slaves. You must select the IP address from Docker network bridge adapter. Pull the Hadoop Docker image on each node:
    docker pull sequenceiq/hadoop-docker:2.7.0
    Note: We have selected the Hadoop Docker image from sequenceiq.
  8. Start all Docker instances on each node by running the following command:
    #docker run -h <this-docker-instance-hostname> -it sequenceiq/hadoop-docker:2.7.0
    /etc/bootstrap.sh -bash

    You can start multiple Docker instances over the same physical node. This command starts a Docker instance with the hostname <this-docker-instance-hostname>.

  9. For each Docker instance, change the /etc/hosts to map the Docker instance IP addresses to the hostname:
    #vi /etc/hosts
    172.17.0.2 node1docker1.gpfs.net node1docker1 
    172.17.0.4 node2docker1.gpfs.net node2docker1 
    172.17.0.6 node3docker1.gpfs.net node3docker1
    Note: This must be done on the console of each Docker instance. You must add all Docker instances here if you want to set them up as one Hadoop cluster.

    After a Docker instance is stopped, all changes are lost and you will have to make this change again after a new Docker instance has been started.

  10. Select a Docker instance and start the Yarn ResourceManager on it:
    #cd /usr/local/hadoop-2.7.0/sbin ./start-yarn.sh

    You cannot run two ResourceManagers in the same Hadoop cluster. Therefore, you run this ResourceManager in the selected Docker instance.

  11. Start Yarn NodeManager on other Docker instances by running the following command:
    #/usr/local/hadoop-2.7.0/sbin/yarn-daemon.sh --config /usr/local/hadoop/etc/hadoop/
    start nodemanager
  12. Run hadoop dfs -ls / to check if you can run Map/Reduce jobs in Docker now. To stop the Yarn services running in Docker, perform the following steps:
    ->on Yarn ResourceManager Docker instance: 
    cd /usr/local/hadoop-2.7.0/sbin ./stop-yarn.sh
    ->on Yarn NodeManager Docker instances:
    	/usr/local/hadoop-2.7.0/sbin/yarn-daemon.sh --config /usr/local/hadoop/etc/hadoop/
    stop nodemanager
    Note: While selecting HDFS transparency, the data locality is not supported for the Map/Reduce jobs running in Docker.