Protocols support overview: Integration of protocol access methods with GPFS

Starting with V4.1.1, IBM Spectrum Scale™ provides additional protocol access methods in the Standard and Advanced editions of the product. Providing these additional file and object access methods and integrating them with GPFS™ offers several benefits. It enables users to consolidate various sources of data efficiently in one global namespace. It provides a unified data management solution and enables not just efficient space utilization but also avoids having to make unnecessary data moves just because access methods may be different.

The additional protocol access methods integrated with GPFS are file access using NFS and SMB and object access using OpenStack Swift. While each of these server functions (NFS, SMB and Object) uses open source technologies, this integration adds value by providing the ability to scale and by providing high availability using the clustering technology in GPFS.

The integration of file and object serving with GPFS allows the ability to create NFS exports, SMB shares, and OpenStack Swift containers that have data in GPFS file systems for access by client systems that do not run GPFS. Some nodes (at least two recommended) in the GPFS cluster have to be designated as protocol nodes (also called CES nodes) from which (non-GPFS) clients can access data residing in and managed by GPFS using the appropriate protocol artifacts (exports/shares/containers). The protocol nodes need to have GPFS server license designations. The protocol nodes should be configured with "external" network addresses that will be used to access the protocol artifacts from clients. The (external) network addresses are different from the GPFS cluster address used to add the protocol nodes to the GPFS cluster. The integration provided allows the artifacts to be accessed from any of the protocol nodes via the configured network addresses. Further, the integration provided allows network addresses associated with protocol nodes to fail over to other protocol nodes when a protocol node fails. All the protocol nodes have to be running the Red Hat Enterprise Linux or the 12 operating system, and the protocol nodes must be all Power® (in big endian mode) or all Intel (although the other nodes in the GPFS cluster could be on other platforms and operating systems).

Note that like GPFS, the protocol serving functionality is also delivered (only) as software. The intent of the functionality is to provide access to data managed by GPFS via additional access methods. While the protocol function provides several aspect of NAS file serving, the delivery is not a NAS appliance. In other words, the GPFS-style command line interface requiring root access is still available, and therefore it is not like an appliance from an administrative management perspective. Role-based access control of the command line interface is not offered. Further, the type of workloads suited for this delivery continue to be those that require the scaling/consolidation aspects associated with traditional GPFS. It is important to note that some NAS workloads may not be suited for delivery in the current release (for instance, very extensive use of snapshots, or support for a very large number of SMB users). For more information, see IBM Spectrum Scale FAQ in IBM® Knowledge Center

Along with the protocol-serving function, the delivery includes the spectrumscale installation toolkit as well as some performance monitoring infrastructure. The GPFS code, including the server function for the three (NFS, SMB, Object) protocols, along with the installation toolkit and performance monitoring infrastructure, are delivered via a self-extracting archive package (just like traditional GPFS). The use of the protocol server function requires additional licenses that need to be accepted. A GPFS package without protocols continues to be provided for those users who do not wish to accept these additional license terms. Note that even though some of the components provided are open source, the specific packages provided should be used. If there are existing versions of these open source packages on your system, they should be removed before installing our software.

Several new commands have been introduced to enable the use of the function described in the preceding sections. The new commands are spectrumscale, mmces, mmuserauth, mmnfs, mmsmb, mmobj, and mmperfmon. In addition, mmdumpperfdata and mmprotocoltrace have been provided to help with data collection and tracing. Existing GPFS commands that have been expanded with some options for protocols include mmlscluster, mmchnode, and mmchconfig. Further, gpfs.snap has been extended to include data gathering about the protocols to help with problem determination.

IBM Spectrum Scale 4.1.1 adds cluster export services (CES) infrastructure to support the integration of the NFS, SMB, and object servers. The NFS Server supports NFS v3 and the mandatory features in NFS v4.0. The SMB server support SMB 2, SMB 2.1, and the mandatory features of SMB 3.0. The object server supports the Kilo release of Openstack Swift along with Keystone v3. The CES infrastructure is responsible for (a) managing the setup for high-availability clustering used by the protocols; (b) monitoring the health of these protocols on the protocol nodes and raising events/alerts in the event of failures; (c) managing the addresses used for accessing these protocols including failover and failback of these addresses because of protocol node failures. For information on the use of CES including administering and managing the protocols, see the Implementing Cluster Export Services chapter of IBM Spectrum Scale: Advanced Administration Guide.

IBM Spectrum Scale enables you to build a data ocean solution to eliminate silos, improve infrastructure utilization, and automate data migration to the best location or tier of storage anywhere in the world. You can start small with just a few commodity servers fronting commodity storage devices and then grow to a data lake architecture or even an ocean of data. IBM Spectrum Scale is a proven solution in some of the most demanding environments with massive storage capacity under the single global namespace. Furthermore, your data ocean can store either files or objects and you can run analytics on the data in-place, which means that there is no need to copy the data to run your jobs.

The spectrumscale installation toolkit is provided to help with the installation and configuration of GPFS as well as protocols. While it was designed for a user who may not be familiar with GPFS, it can help ease the installation and configuration process of protocols even for experienced GPFS administrators.
Note:

The installation toolkit can help with prechecks to validate environment, distribution of the RPMs from one node to the other nodes, and multiple GPFS administrative steps. spectrumscale deploy can be used to configure protocols on an existing GPFS cluster with an existing GPFS file system.

In addition to the installation toolkit, IBM Spectrum Scale 4.1.1 and later also adds a performance monitoring toolkit. Sensors to collect performance information are installed on all protocol nodes, and one of these nodes is designated as a collector node. The mmperfmon query command can be used to view the performance counters that have been collected. .

Some protocol use considerations: