Ceph NVMe-oF gateway
Storage administrators can install and configure an NVMe over Fabrics (NVMe-oF) gateway for an IBM Storage Ceph cluster. With Ceph NVMe-oF gateways, you can effectively run a fully integrated block storage infrastructure with all features and benefits of a conventional Storage Area Network (SAN).
- The NVMe-oF gateway supports VMware vSphere APIs (VAAI), which includes support for vMotion, compare and write, unmap, and write zero.
- NVMe reserve is supported for host clustering.
Block-level access to a Ceph storage cluster used to be limited to QEMU and librbd. Block-level access to the Ceph storage cluster can now take advantage of the NVMe-oF standard to provide data storage.
Use the Ceph Dashboard to easily configure and manage the Ceph NVMe-oF gateway. For more information, see Managing the Ceph NVMe-oF gateway.
The NVMe-oF gateway integrates IBM Storage Ceph with the NVMe over TCP (NVMe/TCP) protocol to provide an NVMe/TCP target that exports Ceph Block Device (RBD) images. The NVMe/TCP protocol allows clients, which are known as initiators, to send NVMe-oF commands to storage devices, which are known as targets, over an Internet Protocol network. Initiators can be Linux clients, VMware clients, or both. For VMware clients, the NVMe/TCP volumes are shown as VMFS Datastore and for Linux clients, the NVMe/TCP volumes are shown as block devices.
For more information about the NVMe over Fabrics (NVMe-oF) protocol, see NVMe over Fabrics.
NVMe-oF in a stretch cluster
An NVMe-oF stretch cluster extends an IBM Storage Ceph deployment across two sites to provide continuous NVMe-over-Fabrics block storage access during site-level failures and planned workload movement.
In a stretch cluster configuration, Ceph storage nodes and NVMe-oF gateways are distributed across two sites, commonly referred to as Site A and Site B. The sites are connected by a network with controlled latency, typically in the range of 2–10 milliseconds.
NVMe-oF initiators, such as VMware ESXi hosts, access Ceph Block Device (RBD) storage through NVMe-oF gateways deployed in both sites. The stretch cluster architecture is designed to support business continuity and disaster recovery (BCDR) use cases, including infrastructure failures, planned maintenance, and controlled migration of workloads between sites.
For more information about stretch cluster behavior, supported architectures, and failure scenarios, see NVMe-oF in a stretch cluster.
For general information about using IBM Storage Ceph stretch clusters, see Stretch clusters for Ceph storage.
High Availability with NVMe-oF gateway group
High Availability (HA) provides I/O and control path redundancies for the host initiators. High Availability is also sometimes referred to as failover and failback support. The redundancy that HA creates is critical to protect against one or more gateway failures. With HA, the host can continue the I/O with only the possibility of performance latency until the failed gateways are back and functioning correctly.
High Availability is enabled by default. To use High Availability, a minimum of two gateways and listeners must be defined. For more information, see Deploying the NVMe-oF gateway.
It is important to create redundancy between the host and the gateways. To create a fully redundant network connectivity, be sure that the host has two Ethernet ports that are connected to the gateways over a network with redundancy (for example, two network switches).
The HA feature uses the Active/Standby approach for each namespace. Using Active/Standby means that at any point in time, only one of the NVMe-oF gateways serve I/O from the host to a specific namespace. To properly use all NVMe-oF gateways, each namespace is assigned to a different load-balancing group. The number of load-balancing groups is equal to the number of NVMe-oF gateways in the gateway group.
With HA, if an NVMe-oF gateway fails, the initiator continues trying to connect. The amount of time that it tries to connect for depends on what is defined for the initiator. For more information about defining the reconnect time for the initiator and general configuration instructions, see Configuring the NVMe-oF gateway initiator.
Scaling-out with NVMe-oF gateway
- Up to 4 NVMe-oF gateway groups.
- Up to 8 NVMe-oF gateways in a gateway group.
- Up to 128 NVMe-oF subsystems within a gateway group.
- Up to 512 hosts per gateway group.
Note: If more than 512 host are required, contact IBM Support.
- 4096 namespaces per gateway group.
NVMe Discovery
The IBM Storage Ceph NVMe-oF gateway supports NVMe Discovery. Each gateway instance that runs in the Ceph cluster also runs a Discovery Controller, which reports the IP addresses of all gateways in the group that are configured with listeners.
For configuring information, see Configuring the NVMe-oF gateway initiator.
NVMe-oF gateway in-band authentication
The Ceph NVMe-oF gateway uses in-band authentication to maintain security against unknown connection requests from unknown initiators. Using the in-band authentication helps ensure appropriate subsystem access only from authorized hosts. NVMe-oF gateway uses SPDK for DH-HMAC-CHAP authentication, allowing users to authenticate with either unidirectional or bidirectional modes.
Table 1 breaks down the two authentication mode types. The difference between unidirectional and bidirectional authentication is if the subsystem has a key. If only the host contains the DH-HMAC-CHAP key, and the subsystem does not, unidirectional authentication is used.
| Authentication mode type | Description | Direction |
|---|---|---|
| Unidirectional | Target verifies the host | Initiator to target |
| Bidirectional |
|
|
The Ceph NVMe‑oF gateway supports in‑band authentication by using DH‑HMAC‑CHAP to control access between NVMe initiator hosts and NVMe‑oF subsystems. Authentication keys can be configured at different scopes to support deployments that require unique authentication per host connection. For more information, see NVMe‑oF in‑band authentication.