IBM InfoSphere BigInsights Version 3.0

IBM General Parallel File System (GPFS)

IBM® General Parallel File System (GPFS™) is an enterprise file system that InfoSphere® BigInsights™ supports as an alternative to HDFS.

IBM General Parallel File System (GPFS) is similar to HDFS in the following ways: On the Linux command line shell, the file system permission for the /tmp directory on GPFS is associated with the HDFS user instead of the GPFS or root user.

GPFS supports local disks on cluster nodes and storage area networks (SANs). Logical isolation and physical isolation are supported so that file sets can be separate file systems inside of a file system (logical isolation), or can be part of separate storage pools (physical isolation). InfoSphere BigInsights uses a customized version of GPFS that supports all existing GPFS commands and provides additional interfaces and commands.

GPFS supports thousands of nodes and petabytes of storage so that you can modify the scale to meet your most demanding needs. Data is replicated on multiple nodes so that no single point of failure exists, whereas the NameNode is a single point of failure in HDFS. You can push updates synchronously or asynchronously, allowing you to choose how you want to manage changes from a primary system to a secondary system.

If a node fails, changes are replicated to other nodes. When the failed node is operational, GPFS quickly determines which blocks must be recovered. Changes that occurred while the node was down are copied to the previously failed node so that the node is synchronized with other nodes in the cluster.

Applications define their own logical block size by segmenting data into file blocks. Each file block is determined based on the effective block size or chunk size. Applications can also determine replication layout by using either wide striping over the network, write affinity on a local disk, or a combination of both layouts. Allowing applications to dictate block size and replication layout provides greater performance and efficiency over HDFS.

To learn more about the features of GPFS, see the Cluster Products Knowledge Center. The following links provide more information about some of the enterprise features that distinguish GPFS.