IBM Support

IBM FlashSystem iSER clustering support for HyperSwap

Technical Blog Post


Abstract

NVMe-oF (Non-Volatile Memory express over Fabrics) is an exciting new storage network technology that lets you take full advantage of IBM FlashCore Modules (FCM) from an Ethernet or InfiniBand network. IBM FlashSystem supports end-to-end NVMe-oF for Fibre Channel solutions as well as RDMA Ethernet based solutions.

Body

By Jack Tedjai, IBM Technology Expert Labs

NVMe-oF is designed to leverage the performance of NVMe technology across the network using Remote Direct Memory Access (RDMA). RDMA is a direct memory access from the memory of one computer into that of another without involving either one's operating system.

NVMe over Fabrics (NVMe-oF)

Figure 1: NVMe over Fabrics (NVMe-oF)

: NVMe protocol Fabrics (NVMe-oF)

Figure 2: NVMe protocol Fabrics (NVMe-oF)

RDMA is provided by either the Transmission Control Protocol (TCP) with RDMA services (iWARP) that uses existing Ethernet setup and therefore no need of huge hardware investment, or RoCE (RDMA over Converged Ethernet) that does not need the TCP layer and therefore provides lower latency.

IBM FlashSystem supports end-to-end NVMe-oF for Fibre Channel solutions as well as RoCE (UDP) or iWARP (TCP).

RoCE uses UDP and needs a lossless Ethernet (Data Center Bridging protocol suite (DCB).
DCB is a discovery and capability exchange protocol to discover peers and negotiate Data Center Bridging configuration. DCB uses LLDP as the underlying protocol for exchange of parameters with the peer; thus, RoCE needs more configuration and switches.

 iWARP is Standard TCP. TCP cares about retransmitting and Packet loss and can be used with any switch.

The requirements for running iSER are:

  • Applications that can use SCSI and iSCSI layer (in the client example detailed in Figure 3 and following, this is VMware ESXi 7.x)
  • A network capable of passing, for example, 25GbE SR SFP28 multimode optical fiber (MMF) or 50uM OM4 multimode fiber
  • Adapter cards that support RDMA (Ethernet or InfiniBand), for example, Mellanox ConnectX-4 Lx 25GbE dual-port (RoCE) or Chelsio T6 2x25 Gbps adapter (iWarp)
  • RDMA over Converged Ethernet switches (with Flow control 802.1Qbb or Priority-based Flow Control), for example, Dell switch S5048-F
  • A target that supports iSER clustering (RoCE or iWarp), for example, all IBM FlashSystem with FCM Modules

In this client case, the following layers were used (see Figure 3):

  • Front-end iSCSI for existing iSCSI 100 Gbe network (Shared Dell network, dedicated vlan)
  • Back-end iSER network for iSER 100 Gbe Clustering (Shared Dell network, dedicated vlan)
  • iSER Interconnect between the data center >10 Gbe darkfiber
  • iSER RoCE VMware for the new ESXi host deployment

 

Customer global design

Figure 3: Client global design

: iSER clustering (iWARP) ip-address setup from the Flashsystem Service Assistant

Figure 4: iSER clustering (iWARP) ip-address setup from the FlashSystem Service Assistant

Overall view of all network card

Figure 5: Overall view of all network card

iSER clustering Ethernet Connectivity between both FlashSystems

Figure 6: iSER clustering Ethernet Connectivity between both FlashSystems

If both FlashSystem can see each other, then start creating the HyperSwap cluster by adding the nodes to an existing standard cluster as described here.

#lscontrolenclosurecandidate
#addcontrolenclosure -iogrp 1 -sernum 78Y00XX

Finally, create ip-quorum for the HyperSwap cluster:

ip-quorum overview

Figure 7: ip-quorum overview

VMware design and VMware MPIO setup

VMware ESXi7.x multipath design

Figure 8: VMware ESXi7.x multipath design

Note: iSER does not support NIC teaming. When configuring port binding, use only one RDMA adapter per vSwitch.

VMware ESXi Storage adapters settings:
From the “Configuration” page, click the “Storage adapters” page. Select the device under “Mellanox iSCSI over RDMA (iSER) Adapter” and click “Properties” to add the network configuration settings.

ESXi Storage Adapters settings

Figure 9: ESXi Storage adapters settings

Note: Double check the MTU size setting.


MTU settings on the Storwize

Figure 10: MTU settings on the Storwize


VLAN support

Figure 11: VLAN support

VMware ESXi 7.x host attachment overview from the Flashsystem

Figure 12: VMware ESXi 7.x host attachment overview from the FlashSystem 


Flashsystem performance test run overview on FCM core modules

Figure 13: FlashSystem performance test run overview on FCM core modules


Conclusion
In closing, NVMe-oF offers better performance and efficiency, especially for small I/O:

  • Lower latency
  • Higher IOPS
  • Less CPU utilization
  • Less power consumption

IBM FlashSystem supports end-to-end NVMe-oF solutions.

Looking for Support?

IBM Technology Expert offers infrastructure services to help organizations build hybrid cloud and enterprise IT. Our Storage consultants can help you secure your enterprise with physical and software-defined storage solutions for on-premises, cloud, converged and virtualized environments.
Contact IBM Technology Expert Labs today to learn more.

[{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"HW206","label":"Storage Systems"},"Component":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"","Edition":"","Line of Business":{"code":"LOB26","label":"Storage"}}]

UID

ibm16165051