Performing parallel copy with mmxcp command

Use the mmxcp enable command to perform parallel copies of files from a source directory to a target directory in a single IBM Storage Scale cluster. The copy can occur within a single file system or across different file systems in the same cluster. It can copy from a live file system or from a global or independent fileset snapshot. The mmxcp command has a strong relationship with the mmapplypolicy command.

The mmxcp sync command performs a synchronize operation from the source directory to the target directory. It uses only a single process, but will only try to copy files that are missing or appear to be different.

The mmxcp verify command performs a quick compare of the data in the source and target directories. Any difference in the metadata is flagged.

The command also lists configuration information about any currently running mmxcp commands and allows you to configure or display the maximum number of mmxcp commands that can run at a single time.

For more information on running mmxcp command, see mmxcp command.

Storage pools

When you perform parallel copying of a file, the status of storage pool of the file can be one of the following conditions:

A copied file will be placed into the storage pool that matches the policy rules at the time the file is copied.
Different file system without storage pool defined, creates copied files in that file system default storage pool.
Different file system with storage pool defined, creates copied files in that file system default storage pool.

Hardlinks

The mmxcp command supports copying hardlinks (multiple files pointing to the same inode) properly most of the time. However, by default it is limited by the way the policy engine splits up the work for faster parallel execution. By default, the policy engine splits up the work into blocks of 100 entries (files/directories). Any hardlinked files within a block of entries are handled correctly. But, any related hardlinked files within other blocks are handled independently. All files are copied, but not all of them might be pointing to the same inode.
The --hardlinks option can be used to ensure that all hardlinks are copied correctly. It implements a second pass through the source files that targets only hardlinked files, and processes all of them as a single block. The --hardlinks option causes the mmxcp command to run for a longer period of time.

Fileset

If the source file system has an independent or dependent fileset and target does not have a fileset, a subdir is created with the same name and the files are copied in that subdir.

File heat

If file heat is enabled on the cluster, source files might not have gpfs.fileHeat EA but copied files might have due to the copy process generating IO activity on the file.
Snapshots do not store gpfs.fileHeat EAs.

The DMAPI extended attributes are not copied because the file loses the migrated state if copied by using the --copy-migrated flag.

File clone

File clones and their relationships are not preserved by the mmxcp command. All files are copied but the copied clone files will now consume additional disk space.
The gpfs.CLONE EA is not copied.

File compression

The sync does not enforce the file compression. The target files are not compressed.
By default, the enable target files are not compressed.
If any of the source files are compressed using mmchattr, then you should specify the --copy-attrs compression option to compress the target files after the cp. The target file compression will use the same compression library that was used for the source.
If the source file is marked as illcompressed and the compression attribute is specified, then the target file will be compressed.
This may cause the mmxcp command to execute for a longer period of time.

File appendonly and immutable attributes

The mmxcp sync command does not copy the appendonly and immutable attributes.
By default the mmxcp enable command does not copy the appendonly and immutable attributes.
The appendonly and/or immutable attributes can be copied by using the using the --copy-attrs option and specifying appendonly and/or immutable when running the mmxcp enable command.
The --copy-attrs option might take longer time to run the mmxcp command.
If there are preexisting files in the target directory that have the appendonly or immutable attributes set, an error might occur. Because these files cannot be overwritten until the appendonly or immutable attributes are removed. See the mmchattr command in the IBM Storage Scale: Command and Programming Reference Guide.

Policy engine interactions

The mmxcp command calls the mmapplypolicy command directly and uses the policy engine LIST rule functions to execute the /usr/lpp/mmfs/bin/xcputil.sh script.

See mmapplypolicy command for information about functional and performance hints of the following mmxcp flags. These flags are passed directly to the mmapplypolicy command.

You can use -g option to specify a global work directory or -s option to specify a local work directory in which one or more nodes can store temporary files. Otherwise, you can use -N option to specify a set of nodes to run parallel instances of policy code for better performance. To display the number of threads that are created and dispatched within each mmapplypolicy process, you can use -m option. The -a option specifies the number of threads and sort pipelines each node will run during the parallel inode scan and policy evaluation or you can use -n option to display number of threads that will be created and dispatched within each mmapplypolicy process during the directory scan phase.

To control the number of files that are passed for each invocation of the EXEC script, you can use -B option and to control the level of information displayed by the mmapplypolicy command, you can use -L n option. The --sort-buffer-size option can be used to set the sort-buffer size that is passed to the sort command. The --qos option specifies the Quality of Service for I/O operations (QoS) class to which the instance of the command is assigned.

General specifications

Both the source file system and the destination file system must be mounted on the node where the command is initiated.
The same enable cannot be run at the same time. It is verified that the source and destination do not already exist in a copy operation that is running in the cluster.
The same sync cannot be run at the same time. It is verified that the source and destination do not already exist in a sync operation that is running in the cluster.
By default, a maximum 10 concurrent copy or sync operations can be running on the cluster at any one time. This variable can be set by using the mmxcp config command.
By default, any other Linux® nodes in the cluster with the source file system and destination file system both mounted might also be used for the copy command. If the -N option is included, then the subset of nodes with both file systems mounted included through the -N option will be used.
Error logging from the command will be written to the log file: /var/adm/ras/mmxcp.log. Errors from other nodes in the command will be written in the mmxcp.log on the node where the code is run.
Messages from the policy engine going to standard out are redirected to the log file: /var/adm/ras/mmxcp.log. Messages to standard error are shown to the screen.
Currently, no message is given that files were skipped due to being in migrated state.
Need to execute --copy-migrated with -N where IBM Storage Protect client is installed.
When copying files from a live file system be aware that files that are created during the mmxcp execution might not be copied and files that are deleted during the mmxcp execution might cause the mmxcp command fail.
Use the mmxcp verify option to perform a quick compare of metadata between a source/snapshot and a target directory from a previously executed mmxcp copy or sync.
When migrated files are synced, it will recall the files before copying the files.
Examine the /var/adm/ras/mmxcp.log file to see the total number of files updated during the sync command.