Network considerations for IBM Storage Ceph
An important aspect of a cloud storage solution is that storage clusters can run out of IOPS due to network latency, and other factors. The storage cluster can run out of throughput due to bandwidth constraints long before the storage clusters run out of storage capacity. As a result, the network hardware configuration must support the chosen workloads to meet price versus performance requirements.
Storage administrators prefer that a storage cluster recovers as quickly as possible. Carefully consider bandwidth requirements for the storage cluster network, be mindful of network link oversubscription, and separate the intra-cluster traffic from the client-to-cluster traffic. Network performance is increasingly important when considering the use of Solid State Disks (SSD), flash, NVMe, and other high performing storage devices.
- Allocate bandwidth to the storage cluster network, such that it is a multiple of the public network by using the osd_pool_default_size parameter as the basis for the multiple on replicated pools. Run the public and storage cluster networks on separate network cards.
- Use 10 Gb/s Ethernet for IBM Storage Ceph deployments in production. A 1 Gb/s Ethernet network is not suitable for production storage clusters.
In the case of a drive failure, replicating 1 TB of data across a 1 Gb/s network takes 3 hours and replicating 10 TB across a 1 Gb/s network takes 30 hours. Using 10 TB is the typical drive configuration. By contrast, with a 10 Gb/s Ethernet network, the replication times would be 20 minutes for 1 TB and 1 hour for 10 TB.
The failure of a larger domain such as a rack means that the storage cluster uses considerably more bandwidth. When building a storage cluster consisting of multiple racks, which is common for large storage implementations, consider using as much network bandwidth between switches in a "fat tree" design for optimal performance. A typical 10 Gb/s Ethernet switch has 48 10 Gb/s ports and four 40 Gb/s ports. Use the 40 Gb/s ports on the spine for maximum throughput. Alternatively, consider aggregating unused 10 Gb/s ports with QSFP+ and SFP+ cables into more 40 Gb/s ports to connect to other rack and spine routers. LACP mode 4 can be used to bond network interfaces. Use jumbo frames with a maximum transmission unit (MTU) of 9000, especially on the backend or cluster network.