Enabling compression

The Ceph Object Gateway supports server-side compression of uploaded objects using any of Ceph’s compression plugins.

The supported compression plugins include the following:
  • zlib
  • snappy
  • zstd

Configuring compression

To enable compression on a zone’s placement target, provide the --compression=TYPE option to the radosgw-admin zone placement modify command. The compression TYPE refers to the name of the compression plugin to use when writing new object data.

Each compressed object stores the compression type. Changing the setting does not hinder the ability to decompress existing compressed objects, nor does it force the Ceph Object Gateway to re-compress existing objects.

This compression setting applies to all new objects uploaded to buckets using this placement target.

To disable compression on a zone’s placement target, provide the --compression=TYPE option to the radosgw-admin zone placement modify command and specify an empty string or none.

Example
[root@host01 ~] radosgw-admin zone placement modify --rgw-zone=default --placement-id=default-placement --compression=zlib
{
...
    "placement_pools": [
        {
            "key": "default-placement",
            "val": {
                "index_pool": "default.rgw.buckets.index",
                "data_pool": "default.rgw.buckets.data",
                "data_extra_pool": "default.rgw.buckets.non-ec",
                "index_type": 0,
                "compression": "zlib"
            }
        }
    ],
...
}

After enabling or disabling compression, restart the Ceph Object Gateway instance so the change will take effect.

Note: Ceph Object Gateway creates a default zone and a set of pools. For production deployments, see Creating a realm.

Compression statistics

While all existing commands and APIs continue to report object and bucket sizes based on their uncompressed data, the radosgw-admin bucket stats command includes compression statistics for all buckets.

The usage types for the radosgw-admin bucket stats command are as follows:
  • rgw.main, for regular entries or objects.
  • rgw.multimeta, for the metadata of incomplete multipart uploads.
  • rgw.cloudtiered, for objects that a lifecycle policy has transitioned to a cloud tier. When configured with retain_head_object=true, a head object that no longer contains data remains but can still serve the object's metadata with HeadObject requests. These stub head objects use the rgw.cloudtiered category. For more information, see Transitioning data to Amazon S3 cloud service.
Syntax
radosgw-admin bucket stats --bucket=BUCKET_NAME
{
...
    "usage": {
        "rgw.main": {
            "size": 1075028,
            "size_actual": 1331200,
            "size_utilized": 592035,
            "size_kb": 1050,
            "size_kb_actual": 1300,
            "size_kb_utilized": 579,
            "num_objects": 104
        }
    },
...
}

The size is the accumulated size of the objects in the bucket, uncompressed and unencrypted. The size_kb is the accumulated size in kilobytes and is calculated as size/1024. In this example, it is 1075028/1024 = 1050.

The size_actual is the actual size of all the objects after each object is rounded up with the ceiling function to the nearest 4096 bytes since the object is stored in 4096-sized chunks. If a bucket has two objects, one of size 4100 bytes and the other of 8500 bytes, the first object is rounded up to 8192 bytes, and the second one rounded 12288 bytes, and their total for the bucket is 20480 bytes. The size_kb_actual is the actual size in kilobytes and is calculated as size_actual/1024. In this example, it is 1331200/1024 = 1300.

The size_utilized is the total size of the compressed data in bytes. This considers both compression and encryption. Encryption could increase the size of the object while compression could decrease it. The size_kb_utilized is the total size in kilobytes and is calculated as size_utilized/1024. In this example, it is 592035/1024= 579.

Here, all the sizes in kilobytes is rounded up with the ceiling function.