Information lifecycle

The Information Lifecycle Management (ILM) feature that is available in the IBM Spectrum Scale system facilitates automated tiered storage management. You need to create a set of policies and rules that automatically determine where to physically store your data regardless of its placement in the logical directory structure. Proper management of files ensures the efficient use and balance of premium and less expensive storage resources.

Policies and the rules are used to assign files to specific file system pools. A file system pool typically contains a set of volumes that provide a specific quality of service for a specific use, such as storing frequently accessed files on a premium pool with high performance and the non-frequently accessed files on a less expensive pool.

A policy is a set of rules that describes the lifecycle of user data based on the file's attributes. Each rule defines an operation or definition, such as placing new files into different pools or migrating files from one pool to another pool. A policy rule is an SQL-like statement that tells the file system what to do with a file in a specific file system pool if the file meets specific criteria. A rule can apply to any file within a file system or only to files within a specific fileset or group of filesets.

The following are the main functions of ILM rules:
  • Initial file placement
  • File management activities such as migration of files from one storage pool to another pool, automatic file deletion, file compression, and file encryption
  • File restoration
The system determines various actions to be performed on the newly created and existing files based on the ILM rules. You can define the following types of rules:
Migration
Manages migration of files to other storage pools. The migration happens when certain thresholds and criteria are met or when you manually run the mmapplypolicy command. If a lower threshold is defined, the migration runs until the lower threshold is reached. The GUI installs the callback handler GUI_THRESHOLD_MIGRATION to react on the threshold limits. The callback handler invokes the mmapplypolicy command when a threshold limit of the active policy of a file system is reached.

Based on the scope of the migration, the system migrates either the entire files in a file system or files in the individual filesets. You can define which files to consider as candidates for migration by specifying certain file selection criteria such as file name, file size, and so on. You can also define the order in which the files must be selected for migration. For example, based on file temperature, last access, and size of the files.

Note: Only a migration rule can target an external pool. An external pool cannot be a valid target for other types of rules, such as placement, file compression, encryption, or deletion rules.
Start of changeMigration to external poolEnd of change
Start of change

GPFS file management policy rules can also control data migration into external storage pools. Before you can write a migration policy with an external storage pool as the target, you must define the external storage pool that the policy references. After you define the target storage pool, you can then create policies that set thresholds that trigger data migration into the referenced external pool.

You can use the migration to external pool rule to migrate the data and metadata stored in the local pool to external pools like tape or transparent cloud tiering. Defining the migration scope and selecting the files for migration are same like migration rule, which is explained before.

You can specify a premigration threshold in the GUI. With premigration, files are copied from an internal pool to an external pool. Thus, a premigrated file is available in both the internal and external pools. Premigration is only possible when migrating from an internal pool to an external pool. The main purpose of premigration is to copy files to the external pool while the internal pool has not reached the migration threshold. If the pool reaches the migration threshold, then migration process is fast because it does not have to transfer files again to the external pool. So, with premigration, the efficiency of ILM policies can be optimized. Premigration is not a backup because if the file in the internal pool is deleted, then there is no reference to the file on the external pool.

Files are premigrated if the storage pool occupancy is between the lower limit of migration threshold (%) and premigration threshold (%). The premigration threshold value must be between 0 and the lower limit of migration threshold.

End of change
Placement
When a file is created or restored, the placement policy determines the file system pool of the file. You can specify the placement scope of a rule to define whether the placement rule applied to the files that are newly created or restored on the entire file system or on specific filesets. If you do not apply a placement rule, all the data is stored in the system storage pool. If the system pool is configured for storing only metadata, the data cannot be stored in the system pool and you must configure a placement rule that chooses another pool to store data. If you modify the placement policy of a file system, that action has no effect on existing files. However, newly created files are always placed according to the currently active placement policy.
Note: If the system pool holds only metadata but there is another storage pool defined for data, the policy engine selects one of the data pools automatically as the default pool, to which file data is initially placed. If you want to ensure that a specific data only storage pool is selected, then a rule that places newly created files in that pool must be defined.

Use the placement criteria to define conditions to place files into a pool. The placement criterion can be the file extension, or the unique ID of owner or owning group of the file. A combination of placement scope and placement criteria helps you to restrict the rule to a specific set of files.

You can specify the number of replicas for the data that is placed through a placement rule. If explicit number of replicas is not specified in the placement policy is defined, the default number of replicas that are defined for the file system is used.

Compression
Manages compression and decompression of files. You can create a file compression rule for either compression or decompression. Compression can be done at the file system level or at the individual fileset level.

You can define which files to consider as candidates for compression by specifying certain file selection criteria such as file name, file size, and so on. You can also define the order in which the files must be selected for compression or decompression.

Migration and compression
Manages compression and migration of the files that are migrated to the other storage pool. In this case, you cannot select an external pool as the target pool.
Deletion
Manages deletion of files from a storage pool. You can automate the deletion process by defining certain thresholds in the deletion rule. When the predefined thresholds are met, the system starts and stops the deletion process.

You can also specify the scope and order in which the files are selected for deletion.

Exclude
Excludes a set of files from the subsequent rules based on the scope and criteria that are defined in the exclude rule. If multiple rules exist, an exclusion rule can speed up the processing of the rule evaluation because the number of candidate files can be reduced.
Encryption
Manages the encryption of the newly created files. An encryption rule requires an encryption specification rule to be created first.
Encryption specification
Defines the way in which the files are encrypted. You can specify encryption parameter, key combination mode, and wrapping mode for the FEK in this policy. You can also opt to use the default NIST SP 800-131A compliant encryption algorithm.
Encryption exclude
Excludes a set of files from the subsequent encryption rule.
External pool
Manages the movement of data from the online storage to offline storage pool or near-line external storage pool. This allows the IBM Spectrum Scale to transparently control offline storage pool and provide a tiered storage solution that includes tape or other storage media.

You can create migration, external pool, deletion, encryption, and file compression rules for effective file management. Over the life of the file, data can be migrated to a different storage pool any number of times. Files can be compressed or decompressed, files can be deleted, and files can be restored from an external storage pool. These file management rules can also be used to control the space utilization of internal storage pools.

If you have defined a migration, file compression, or encryption, you must register a callback script. If the callback script is registered, whenever the utilization for an online pool exceeds the specified high threshold value, the IBM Spectrum Scale system invokes the callback script, which starts the migration, file compression, or deletion to reduce the utilization of the pool.

The scope of the migration, file compression, and deletion rules defines whether the rule is applicable to the entire file system or specific to individual filesets. You can also define criteria based on which files are filtered for each type of action. You can specify the following criteria for each of these rules:
  • File path or extension
  • File size
  • User ID and group ID
  • Time since the file was created
  • Time since the file was last modified
  • Time since the metadata of the file was last modified
  • Time since the file was last accessed

When a storage pool reaches the defined threshold, the system generates a list of files, and invokes a user provided script or program that initiates the appropriate commands for the external data management application to process the files. This allows GPFS to transparently control offline storage and provide a tiered storage solution that includes tape or other media.

You can define multiple external storage pools at any time by using GPFS policy rules. To move data to an external storage pool, the GPFS policy engine evaluates the rules that determine which files qualify for transfer to the external pool. From that information, GPFS provides a list of candidate files and runs the script that is specified in the rule that defines the external pool. That executable script is the interface to the external application, such as IBM Spectrum Protect, which does the actual migration of data into an external pool. Using the external pool interface, GPFS facilitates the following tasks:
  1. Move files and their extended attributes onto low-cost near-line or offline storage when demand for the files diminishes.
  2. Recall the files, with all of their previous access information, onto online storage whenever the files are needed.

GPFS evaluates policy rules in order, from first to last, as they appear in the policy. The first rule that matches determines what is to be done with that file. For example, when a client creates a file, GPFS scans the list of rules in the active file placement policy to determine which rule applies to the file.

Creating and applying policy

Select Policy Repository to create new policy and define rules for it. You can also modify the already created policies and apply a policy as the active policy for a file system. You need to select Active Policy to see the active policy for a file system. You can also modify the active policy based on the requirement.

The following steps explain how to create and apply a policy:
  1. Go to Files > Information Lifecycle.
  2. Select Policy Repository.
  3. Click Create Policy and specify the required details.

    The policy is created. Now, you need to add rules in the policy that manages the files in the system.

  4. Click Add Rule in the Policy Repository and define rules with the required rule types. You can create multiple rules in a policy. You can drag the rules in the rules list to change the order in which the rules are applied in a policy. The Add Rule option only supports to add placement, migration, file compression, or deletion rules, or to define an external pool. To add encryption, exclusion, or list rules, the policy text must be modified by using the text editor.
  5. Optionally, you can use the text editor to edit policy text. Click Policy Text option that is available in the upper right corner of the GUI page to launch the text editor. To work with deletion, file compression, exclusion, or list rules, the policy text must be modified through the text editor.
  6. After editing the policy details, click Apply Changes.
  7. If you want to apply a policy as the active policy for a file system, select the policy from the Policy Repository and then select Apply as Active Policy option that is available in the Actions menu. You can also change the active policy of the file system.
Note: The policies invoked by the GUI are called with default tuning parameters of the mmapplypolicy command. On larger systems, you can improve the performance of this operation by running the policies directly in the CLI and adapting the parameters that the mmapplypolicy command offers.

Editing policy by using text editor

To define or modify file placement, migration, file compression, deletion, or external pool rules, the GUI provides an easy to use graphical editing mode. For working with rules like encryption, exclusion, and list, you need to manually edit the SQL policy text by using the text editor.

If there is only one rule in the policy and it is not supported in the graphical editing mode, the whole policy can only be displayed or modified through the policy text editor.

Defining the policy run settings

You can define some of the policy run parameters that are used every time the ILM policy is run from the Information Lifecycle page in the GUI.
Note: The policy run settings that you set in the GUI are applicable when the policy execution is triggered by the default threshold callback or when using the Run Policy action in the GUI. These parameters are not applicable when a custom callback script is registered or when you run the policy by using the mmapplypolicy command in the CLI.
You can specify the following details that determine the policy run characteristics:
  • Node that run the policies
    The ILM policy can run parallel on multiple nodes. The following types of nodes are available for selection:
    • Nodes of a node class
    • Default helper nodes. Nodes can be marked as helper nodes by using the defaultHelperNodes parameter of the mmchconfig command.
    • Manager nodes. These are nodes from which file system managers and token managers are selected.
    • Individual nodes
  • Local work directory

    The directory to be used for temporary storage during policy execution. This is a local directory like /tmp used on each helper node. A significant amount of temporary storage is required if the file system or directories contain many files.

  • Global work directory

    A global directory to be used for temporary storage during policy execution. The specified directory must exist within a shared file system. It must also be mounted and available for writing and reading from each of the nodes. Using a global work directory causes high performance and fault-tolerant protocols during policy execution.

  • File selection algorithm. The following algorithm types are available for selection:
    • Exact: Sorts all the candidate files by weight, then serially considers each file from the highest weight to the lowest weight by choosing feasible candidates for migration, deletion, or listing according to any applicable rule LIMITs and current storage-pool occupancy.
    • Fast: Uses a combination of statistical, heuristic, and parallel computing methods to favor higher weight candidate files, but the set of chosen candidates might be different than the exact method.
    • Best: Chooses the optimal method based on the rest of the input parameters.
  • Average number of CPU cores per node.

    The number of threads and sort pipelines that each node runs during the parallel inode scan and policy evaluation.

  • Number of threads per policy scan.

    The number of threads are created and dispatched within each mmapplypolicy process during the directory scan phase. The default is 24.

  • Number of threads for policy execution.

    The number of threads that are created and dispatched within each mmapplypolicy process during the policy execution phase. The default value is 24.

  • Maximum number of files per batch.

    Specifies how many files are passed for each invocation of the EXEC script. The default value is 100. If the number of files exceeds the value that is specified, the mmapplypolicy command starts the external program multiple times.

Log files

The policy executions that are invoked by using the Run Policy action log the details in the /var/log/cnlog/ilm directory.

The policy executions can also be triggered based on a threshold that is managed by the callback handler, which is installed on the GUI node. Such policy execution details are logged in the /var/adm/ras directory and also in the /var/adm/ras/mmfs.log file.