Dual network interfaces
This section explains about the FPO mode and IBM Storage® Scale System or SAN-based storage mode.
FPO mode
- The first option is to bind the two network interfaces and deploy the IBM Storage Scale cluster and the Hadoop cluster over the bonded interface.
- The second option is to configure one network interface for the Hadoop services including the
HDFS transparency service and configure the other network interface for IBM
Storage Scale to use for data traffic. This configuration can
minimize interference between disk I/O and application communication.To ensure that the Hadoop applications use data locality for better performance, perform the following steps:
- Configure the first network interface with one subnet address (for example, 192.0.2.0). Configure the second network interface as another subnet address (for example, 192.0.2.1).
- Create the IBM Storage Scale cluster and NSDs with the IP or hostname from the first network interface.
- Install the Hadoop cluster and HDFS transparency services by using IP addresses or hostnames from the first network interface.
- Run
mmchconfig subnets=192.0.2.1 -N all.Note: 192.0.2.1 is the subnet used for IBM Storage Scale data traffic.
For Hadoop map/reduce jobs, the scheduler Yarn checks the block location. HDFS Transparency returns the hostname that is used to create the IBM Storage Scale cluster, as block location to Yarn. If the hostname is not found within the NodeManager list, Yarn cannot schedule the tasks according to the data locality. The suggested configuration can ensure that the hostname for block location can be found in Yarn NodeManager list and therefore it can schedule the task according to the data locality.
For a Hadoop distribution like IBM® BigInsights® IOP, all Hadoop components are managed by Ambari™. In this scenario, all Hadoop components, HDFS transparency and IBM Storage Scale cluster must be created using one network interface. The second network interface must be used for GPFS™ data traffic.
Centralized storage modes (IBM Storage Scale System, ECE, SAN-based)
- The first option is to configure the two adapters as bond adapter and then, deploy HortonWorks HDP and IBM Storage® Scale over the bond adapters.
- The second option is to configure one adapter for IBM Storage Scale cluster and HortonWorks HDP and configure another adapter as
subnets of IBM Storage Scale.
For more information on subnets, see GPFS and network communication in the IBM
Storage Scale: Concepts,
Planning, and Installation Guide. Perform the following steps:
- Configure the first network interface with one subnet address (for example, 192.0.2.0). Configure the second network interface as another subnet address (for example, 192.0.2.1).
- Create the IBM Storage Scale cluster with the IP or hostname from the first network interface.
- Install the Hadoop cluster and HDFS transparency services by using the IP addresses or hostnames from the first network interface.
- Run
mmchconfig subnets=192.0.2.1 -N all.Note: 192.0.2.1 is the subnet used for IBM Storage Scale data traffic.