Parallelism in Data Management applications

Given the multiple-node environment of GPFS, it is desirable to exploit parallelism in the Data Management application as well.

This can be accomplished in several ways:
  • On a given session node, multiple DM application threads can access the same file in parallel, using the same session. There is no limit on the number of threads that can invoke DMAPI functions simultaneously on each node.
  • Multiple sessions, each with event dispositions for a different file system, can be created on separate nodes. Thus, files in different file systems can be accessed independently and simultaneously, from different session nodes.
  • Dispositions for events of the same file system can be partitioned among multiple sessions, each on a different node. This distributes the management of one file system among several session nodes.
  • Although GPFS routes all events to a single session node, data movement may occur on multiple nodes. The function calls dm_read_invis, dm_write_invis, dm_probe_hole, and dm_punch_hole are honored from a root process on another node, provided it presents a session ID for an established session on the session node.
    A DM application may create a worker process, which exists on any node within the GPFS cluster. This worker process can move data to or from GPFS using the dm_read_invis and dm_write_invis functions. The worker processes must adhere to these guidelines:
    1. They must run as root.
    2. They must present a valid session ID that was obtained on the session node.
    3. All writes to the same file which are done in parallel must be done in multiples of the file system block size, to allow correct management of disk blocks on the writes.
    4. No DMAPI calls other than dm_read_invis, dm_write_invis, dm_probe_hole, and dm_punch_hole may be issued on nodes other than the session node. This means that any rights required on a file must be obtained within the session on the session node, prior to the data movement.
    5. There is no persistent state on the nodes hosting the worker process. It is the responsibility of the DM application to recover any failure which results from the failure of GPFS or the data movement process.