Recommended hardware resource configuration
10Gb Ethernet network is the minimum recommended configuration for Hadoop nodes. Higher speed networks, such as 25Gb/40Gb/100Gb/InfiniBand, can provide overall better performance. Hadoop nodes should have a minimum of 100GB memory and at least four physical cores. If Hadoop services are running with the same nodes as the HDFS Transparency service, a minimum of 8 physical cores is recommended. If an IBM Storage Scale FPO deployment pattern is used, 10-20 internal SAS/SATA disks per node are recommended.
In a production cluster, minimal node number for HDFS Transparency is 3. The first node as active NameNode, the second node as standby NameNode and the third node as DataNode. In testing cluster, one node is sufficient for HDFS Transparency cluster and the node could be configured as both NameNode and DataNode.
HDFS Transparency is a light-weight daemon and usually one logic modern processor (For example, 4-core or 8-core CPU with 2+GHz frequency).
Ranger Support | HDFS Transparency NameNode | HDFS Transparency DataNode |
---|---|---|
Ranger support is off [1] | 2GB or 4GB | 2GB |
Ranger support is on (by default) | Depends on the file number that the Hadoop applications will access [2]: 1024 bytes * inode number | 2GB |
HDFS Transparency NameNode | HDFS Transparency DataNode |
---|---|
Depends on the file number that the Hadoop applications will access: 700 bytes * inode number. | 2GB |
As for SAN-based storage or IBM Storage Scale System, the number of Hadoop nodes required for scaling depends on the workload types. If the workload is I/O sensitive, you could calculate the Hadoop node number according to the bandwidth of IBM Storage Scale System head nodes and the bandwidth of Hadoop node. For example, if the network bandwidth from your IBM Storage Scale System head nodes is 100Gb and if your Hadoop node is configured with 10Gb network, for I/O sensitive workloads, 10 Hadoop nodes (100Gb/10Gb) will drive all network bandwidth for your IBM Storage Scale System head nodes. Considering that most Hadoop workloads are not pure I/O reading/writing workloads, you can take 10~15 Hadoop nodes in this configuration.
IBM® ECE minimum hardware requirements
At a high level, it is required to have between 4 to 32 storage servers per recovery group (RG), and each server must be a x86_64 server running Red Hat® Enterprise Linux® version 7.5 or 7.6. The storage configuration must be identical for all the storage servers. The supported storage types are SAS-attached HDD or SSD drives and using specified LSI adapters, or enterprise-class NVMe drives. Each storage server must have at least one SSD or NVMe drive. This is used for a fast write cache as well as user data storage.
For more information about hardware requirement, see ECE Minimum hardware requirements in the IBM Storage Scale Erasure Code Edition guide.