Enabling compression

The Ceph Object Gateway supports server-side compression of uploaded objects using any of Ceph’s compression plugins. These include:

  • zlib: Supported.

  • snappy: Supported.

  • zstd: Supported.

Configuration

To enable compression on a zone’s placement target, provide the --compression=TYPE option to the radosgw-admin zone placement modify command. The compression TYPE refers to the name of the compression plugin to use when writing new object data.

Each compressed object stores the compression type. Changing the setting does not hinder the ability to decompress existing compressed objects, nor does it force the Ceph Object Gateway to re-compress existing objects.

This compression setting applies to all new objects uploaded to buckets using this placement target.

To disable compression on a zone’s placement target, provide the --compression=TYPE option to the radosgw-admin zone placement modify command and specify an empty string or none.

Example

[root@host01 ~] radosgw-admin zone placement modify --rgw-zone=default --placement-id=default-placement --compression=zlib
{
...
    "placement_pools": [
        {
            "key": "default-placement",
            "val": {
                "index_pool": "default.rgw.buckets.index",
                "data_pool": "default.rgw.buckets.data",
                "data_extra_pool": "default.rgw.buckets.non-ec",
                "index_type": 0,
                "compression": "zlib"
            }
        }
    ],
...
}

After enabling or disabling compression, restart the Ceph Object Gateway instance so the change will take effect.

Note: Ceph Object Gateway creates a default zone and a set of pools. For production deployments, see Creating a realm.

Statistics

While all existing commands and APIs continue to report object and bucket sizes based on their uncompressed data, the radosgw-admin bucket stats command includes compression statistics for all buckets.

The usage types for the radosgw-admin bucket stats command are: -
  • rgw.main refers to regular entries or objects.
  • rgw.multimeta refers to the metadata of incomplete multipart uploads.
  • rgw.cloudtiered refers to objects that a lifecycle policy has transitioned to a cloud tier. When configured with retain_head_object=true, a head object is left behind that no longer contains data, but can still serve the object's metadata with HeadObject requests. These stub head objects use the rgw.cloudtiered category. See Transitioning data to Amazon S3 cloud service for more information.

Syntax

radosgw-admin bucket stats --bucket=BUCKET_NAME
{
...
    "usage": {
        "rgw.main": {
            "size": 1075028,
            "size_actual": 1331200,
            "size_utilized": 592035,
            "size_kb": 1050,
            "size_kb_actual": 1300,
            "size_kb_utilized": 579,
            "num_objects": 104
        }
    },
...
}

The size is the accumulated size of the objects in the bucket, uncompressed and unencrypted. The size_kb is the accumulated size in kilobytes and is calculated as ceiling(size/1024). In this example, it is ceiling(1075028/1024) = 1050.

The size_actual is the accumulated size of all the objects after each object is distributed in a set of 4096-byte blocks. If a bucket has two objects, one of size 4100 bytes and the other of 8500 bytes, the first object is rounded up to 8192 bytes, and the second one rounded 12288 bytes, and their total for the bucket is 20480 bytes. The size_kb_actual is the actual size in kilobytes and is calculated as size_actual/1024. In this example, it is 1331200/1024 = 1300.

The size_utilized is the total size of the data in bytes after it has been compressed and/or encrypted. Encryption could increase the size of the object while compression could decrease it. The size_kb_utilized is the total size in kilobytes and is calculated as ceiling(size_utilized/1024). In this example, it is ceiling(592035/1024)= 579.

Here, all the sizes in kilobytes is rounded up with the ceiling function.