File compression

You can compress or decompress files either with the mmchattr command or with the mmapplypolicy command with a MIGRATE rule. You can do the compression or decompression synchronously or defer it until a later call to mmrestripefile or mmrestripefs.

IBM Spectrum Scale™ V4.2 adds file compression to reduce the size of data at rest. File compression is intended primarily for cold data and favors saving space over access speed. File compression can be driven by policies that enabled administrators to compress only files that are not accessed for some specified time. Data is decompressed inline for each read access.

Comparison with object compression

File compression and object compression use the same compression technology but are available in different environments and are configured in different ways. Object compression is available in the Cluster Export Systems (CES) environment and is configured with the mmobj policy command. With object compression, you can create an object storage policy that periodically compresses new objects and files in a GPFS™ fileset.

File compression is available in non-CES environments and is configured with the mmapplypolicy command or directly with the mmchattr command.

When to use file compression

File compression in this release is designed to be used only for compressing cold data or write-once objects and files. Compressing other types of data can result in performance degradation. File compression uses the zlib data compression library and favors saving space over speed.

Setting up file compression and decompression

The sample script /usr/lpp/mmfs/samples/ilm/mmcompress.sample, installed with IBM Spectrum Scale, provides examples of how to compress or decompress a fileset or a directory tree.

You can do file compression or decompression with either the mmchattr command or the mmapplypolicy command.
Note: File compression and decompression with the mmapplypolicy command is not supported on Windows.
With the mmchattr command, you specify the --compression option and the names of the files or filesets that you want to compress or decompress. For example, the following command compresses a file:
mmchattr --compression yes trcrpt.150913.13.30.13.3518.txt
The following command decompresses the same file:
mmchattr --compression no trcrpt.150913.13.30.13.3518.txt
For more information, see mmchattr command.
With the mmapplypolicy command, you create a MIGRATE rule that specifies the COMPRESS option and run mmapplypolicy to apply the rule. For example, the following rule, which applies to files with names that contain the string green, migrates files out of a storage pool and compresses them:
RULE 'COMPR1' MIGRATE FROM POOL 'datapool' COMPRESS('yes') WHERE NAME LIKE 'green%'
The following rule migrates and decompresses the same set of files:
RULE 'COMPR1' MIGRATE FROM POOL 'datapool' COMPRESS('no') WHERE NAME LIKE 'green%'
In the following example, the first rule excludes from compression any file that ends with .mpg or .jpg. The second rule automatically compresses any file that was not accessed in the last 30 days:
RULE 'NEVER_COMPRESS' EXCLUDE WHERE lower(NAME) LIKE '%.mpg' OR lower(NAME) LIKE '%.jpg'
RULE 'COMPRESS_COLD' MIGRATE COMPRESS('yes') WHERE (CURRENT_TIMESTAMP - ACCESS_TIME) >
(INTERVAL '30' DAYS)
For more information, see the following help topics:

When you do file compression, you can defer the compression operation a later time. For more information, see the subtopic Deferred file compression.

Warning

Doing any of the following operations while the mmrestorefs command is running can corrupt file data:
  • Doing file compression or decompression. This includes compression or decompression with the mmchattr command or with a policy and the mmapplypolicy command.
  • Running the mmrestripefile command or the mmrestripefs, either to complete a deferred file compression or decompression, or for any other reason.

Reported size of compressed files

After a file is compressed, operating system commands, such as ls -l, display the uncompressed size. Use du or the GPFS command mmdf to display the actual, compressed size. You can also make the stat() system call to find how many blocks the file occupies.

Deferred file compression

By default, the command that launches a file compression or decompression does not return until after the compression or decompression operation is completed. However, with both the mmchattr command and the mmapplypolicy compression, you can defer the compression or decompression operation and have the command return as soon as it completes any other operations. By deferring compression or decompression, you can complete the operation later when the system is not heavily loaded with processes or I/O.

To defer the compression, with either command, specify the -I defer option. For example, the following command marks the specified file as needing compression but defers the compression operation:
mmchattr -I defer --compression yes trcrpt.150913.13.30.13.3518.txt
With the mmapplypolicy command, the -I defer option defers compression or decompression as well as data movement or deletion. For example, the following command applies the rules in the file policyfile but defers the file operations that are specified in the rules, including compression or decompression:
mmapplypolicy fs1 -P policyfile -I defer
To complete a deferred compression or decompression, run the mmrestripefile command or the mmrestripefs command with the -z option. (Do not run either of these commands if an mmrestorefs command is running. See the warnings in the preceding subtopic Warning.) The following command completes the deferred compression or decompression of the specified file:
mmrestripefile -z trcrpt.150913.13.30.13.3518.txt

Indicators of file compression or decompression

The mmlsattr command displays two indicators that together describe the state of compression or decompression of the specified file:
COMPRESSED
The mmlsattr command displays the COMPRESSED indicator on the Misc attributes line of its output. See the example of mmlsattr output in Figure 1. If present, COMPRESSED indicates that the file is compressed or is marked for deferred compression. If absent, the absence indicates that the file is uncompressed or is marked for deferred decompression.
Note:

This indicator reflects the state of the GPFS_IWINFLAG_COMPRESSED flag in the gpfs_iattr64_t structure of the inode of the file. For more information about this structure, see the topic gpfs_iattr64_t structure.

illCompressed
The mmlsattr command displays the illCompressed indicator on the flags line of its output. See Figure 1. If present, illCompressed indicates that the file is marked for compression or decompression but that compression or decompression is not completed. If absent, the absence indicates that compression or decompression is completed. For more information about this structure, see the topic gpfs_iattr64_t structure.
Note:
  • This indicator reflects the state of the GPFS_IAFLAG_ILLCOMPRESSED flag in the gpfs_iattr64_t structure of the inode of the file. For more information about this structure, see the topic gpfs_iattr64_t structure.

  • Some file system events can cause the illCompressed flag to be set. Consider the following examples:
    • When data is written into an already compressed file, the existing data remains compressed but the new data is uncompressed. The illCompressed flag is set for this file.
    • When a compressed file is memory-mapped, the memory-mapped area of the file is decompressed before it is read into memory. The illCompressed flag is set for this file.
    For more information, see the subtopic Updates to compressed files.
In the following example, the output from the mmlsattr command includes both the COMPRESSED indicator and the illCompressed indicator. This combination indicates that the file is marked for compression but that compression is not completed:
Figure 1. Compression and decompression indicators
mmlsattr -L green02.51422500687
file name:            green02.51422500687
metadata replication: 1 max 2
data replication:     2 max 2
immutable:            no 
appendOnly:           no
flags:                illCompressed
storage pool name:    datapool
fileset name:         root
snapshot name:
creation time:        Wed Jan 28 19:05:45 2015
Misc attributes:      ARCHIVE COMPRESSED
Encrypted:            no
       

Together the Compressed and illCompressed indicators indicate the compressed or uncompressed state of the file. See the following table:

Table 1. COMPRESSED and illCompressed indicators
State of the file COMPRESSED is displayed? illCompressed is displayed?
Uncompressed. No No
Decompression is not complete. No Yes
Compressed. Yes No
Compression is not complete. Yes Yes

Partially compressed files

The COMPRESSED flag is set when the user selects the file to be compressed through the mmchattr --compress yes command or a policy run. The flag indicates that the user wants the file to be compressed.

If the user specifies the -I defer command option with the mmchattr command or policy run, the illCompressed flag is set during the command execution or the policy run. The file's illCompressed flag indicates that the request to compress the file has not been fulfilled. The illCompressed flag is reset at the conclusion of the actual compression execution on the file, after mmrestripefs -z or mmrestripefile -z command finishes compressing the file if the -I defer option was used. The illCompressed flag can be set again upon contents updates on the file that cause update-driven uncompression.

The compressibility of a file can change over time if its contents are changed. Different parts of a file may have different compressibility. Based on the 10% space-saving criterion (see the subtopic Limitations), some compression groups (in granularity of 10 data blocks) of a file might be compressed while others are not.

In sum, the state of the Compressed flag, on or off, indicates the intention of the user to compress the file or not. The illCompressed flag indicates the compression execution status. The actual compression status of the data blocks depends on the illCompressed and Compressed flags as well as the compressibility of the current data.

Updates to compressed files

When a compressed file is updated by a write operation, the file system automatically decompresses the region of the file that contains the affected data and sets the illCompressed flag. The file system then makes the update. To recompress the file, run the mmrestripefile command with the -z option, as in the following example:
mmrestripefile -z trcrpt.150913.13.30.13.3518.txt

The mmrestorefs command can cause a compressed file in the active file system to become decompressed if it is overwritten by the restore process. To recompress the file, run the mmrestripefile command with the -z option.

For more information, see the preceding subtopic Deferred file compression.

File compression and memory mapping

You can memory-map a file that is already compressed. The file system automatically decompresses the paged-in region and sets the illCompressed flag. To recompress the file, run the mmrestripefile command with the -z option.

As a convenience, the file system does not compress an uncompressed file or partially decompressed file if the file is memory-mapped. Compressing the file would not be not effective because memory mapping decompresses any compressed data in the regions that are paged in.

File compression and direct I/O

You can open a compressed file for Direct I/O, but internally the direct I/O reads and writes are replaced by buffered decompressed I/O reads and writes.

As a convenience, the file system does not compress a file that is opened for Direct I/O. Compressing the file would not be effective because direct I/O would be replaced by buffered decompressed I/O.

Backing up and restoring compressed files

Files are decompressed when they are moved out of storage that is directly managed by IBM Spectrum Scale. This fact affects file backups by products like IBM Spectrum Protect™, IBM Spectrum Protect for Space Management (HSM), IBM Spectrum Archive™, Transparent Cloud Tiering (TCT), and others. When you back up a file with these products, the file system decompresses the file data inline when it is read by the backup agent. The file system also sets the illCompressed flag in the file properties. The backed-up file data is not compressed.

When you restore a file to the IBM Spectrum Scale file system, the file data remains uncompressed but the illCompressed flag is still set. You can recompress the file by running mmrestripefs or mmrestripefile with the -z option.

Start of change

FPO environment

File compression supports a File Placement Optimizer (FPO) environment or horizontal storage pools.

FPO block group factor: Before you compress files in an File Placement Optimizer (FPO) environment, you must set the block group factor to a multiple of 10. If you do not, then data block locality is not preserved and performance is slower.
For compatibility reasons, before you do file compression with an AFM cache or FPO files, you must upgrade the whole cluster to version 4.2.1 or later. To verify that the cluster is upgraded, follow these steps:
  1. At the command line, enter the mmlsconfig command with no parameters.
  2. In the output, verify that minReleaseLevel is >= 4.2.1.0.
End of change

Limitations

File compression has the following limitations:
  • File compression in this release is designed to be used only for compressing cold data or write-once objects and files. Compressing other types of data can result in performance degradation. File compression uses the zlib data compression library and favors saving space over speed.
  • File compression processes each compression group within a file independently. A compression group consists of one to ten consecutive data blocks within a file. If the file contains fewer than ten data blocks, the whole file is one compression group. If the space savings for a compression group is less than 10%, file compression does not compress it but skips to the next compression group.
  • For file-enabled compression in an FPO-enabled file system, the block group factor must be a multiple of 10 so that the compressed data maintains data locality. If the block group factor is not a multiple of 10, the data locality is broken.
  • Direct I/O is not supported for compressed files.
  • The following operations are not supported:
    • Compressing files in snapshots
    • Compressing a clone
    • Compressing files in an AFM cache site or in an AFM-based asynchronous Disaster Recovery (DR) fileset.
    • Compressing small files (files that consume fewer than two subblocks, compressing small files into an inode).
    • Compressing files other than regular files, such as directories.
    • Cloning a compressed file
  • On Windows:
    • Compression or decompression with the mmapplypolicy command is not supported.
    • Compression of files in Windows hyper allocation mode is not supported.
    • The following Windows APIs are not supported:
      • FSCTL_SET_COMPRESSION to enable/disable compression on a file
      • FSCTL_GET_COMPRESSION to retrieve compression status of a file
    • In Windows Explorer, in the Advanced Attributes window, the compression feature is not supported.