Hardware high availability on rack-mounted models
High availability of rack-mounted Integrated Analytics System appliances is provided by the following redundant and resilient hardware elements.
Robust hardware elements
- Flash storage
- The IBM FlashSystem 900 provides a completely resilient storage subsystem to the appliance. The storage array is protected with two load sharing power supplies and redundant fans and two separate storage controllers. The flash storage is organized into a two-dimensional RAID 5 configuration. The storage controllers ensure that the data is written coherently and consistently in a RAID5 layout across the flash elements within each flash module, and then again in a RAID 5 layout across the flash modules. Because of this level of robustness in the FlashSystem subsystem and the multiple connections to it, Integrated Analytics System treats the FlashSystem storage as a completely resilient subsystem.
Redundant hardware elements
- Data fabric
- The high speed data network among the processing nodes of the appliance is implemented as two separate redundant fabrics. Pairs of switches are used to provide complete failover redundancy. In normal operation the links are bonded together to give full cross-sectional bandwidth to the processing cluster. The total bandwidth is over-engineered to provide high performance even if one of the switch fabrics fails.
- Management network
- The Management network used to configure and monitor the hardware elements of the appliance is also implemented using a fully redundant pair of switches with bonded interfaces on the processing nodes.
- Storage Fibre Channel Network
- The Fibre Channel connections between the Processing Nodes and the Flash Storage arrays in each rack of the appliance are made through a redundant pair of Fibre Channel switches. Each node has multiple connections to each of the switches, and the switches have multiple connections to the Flash Arrays. These connections are used to provide high bandwidth and highly available storage connectivity. All of the connections are managed by a multi-pathing component on the processing nodes of the appliance.
- Processing nodes
- The processing nodes of the appliance are organized into a highly available cluster to provide continuous operation in the event of a failure of one of the nodes. The processing requirements of a failed node can be redistributed to the surviving nodes within the same rack as the failed node. The system is designed with some excess processing capacity so that it can continue to provide high performance in the presence of a node failure. The system is designed to operate with as few as one half of the original nodes, plus one.