Adjusting garbage collection settings
Adjust garbage collection settings to increase priority or tune the garbage collection queue.
The Ceph Object Gateway allocates storage for new and overwritten objects immediately. Also, the parts of a multi-part upload also use some storage.
The Ceph Object Gateway purges the storage space that is used for deleted objects after deleting the objects from the bucket index. Similarly, the Ceph Object Gateway will delete data that is associated with a multi-part upload after the multi-part upload completes or when the upload has gone inactive or failed to complete for a configurable amount of time.
radosgw-admin gc listGarbage collection is a background activity that runs continuously or during times of low loads, depending upon how the storage administrator configures the Ceph Object Gateway. By default, the Ceph Object Gateway conducts garbage collection operations continuously. Since garbage collection operations are a normal function of the Ceph Object Gateway, especially with object delete operations, objects eligible for garbage collection exist most of the time.
Some workloads can temporarily or permanently outpace the rate of garbage collection activity. This is especially true of delete-heavy workloads, where many objects get stored for a short period of time and then deleted. For these types of workloads, storage administrators can increase the priority of garbage collection operations relative to other operations with the following configuration parameters:
- The
rgw_gc_obj_min_waitconfiguration option waits a minimum length of time, in seconds, before purging a deleted object’s data. The default value is two hours, or 7200 seconds. The object is not purged immediately because a client might be reading the object. Under heavy workloads, this setting can use too much storage or have many deleted objects to purge. Do not set this value below 30 minutes, or 1800 seconds. - The
rgw_gc_processor_periodconfiguration option is the garbage collection cycle run time. That is, the amount of time between the start of consecutive runs of garbage collection threads. If garbage collection runs longer than this period, the Ceph Object Gateway will not wait before running a garbage collection cycle again. - The
rgw_gc_max_concurrent_ioconfiguration option specifies the maximum number of concurrent IO operations that the gateway garbage collection thread uses when purging deleted data. Under delete heavy workloads, consider increasing this setting to a larger number of concurrent IO operations. -
The
rgw_gc_max_trim_chunkconfiguration option specifies the maximum number of keys to remove from the garbage collector log in a single operation. Under delete heavy operations, consider increasing the maximum number of keys so that more objects are purged during each garbage collection operation.
- The
rgw_gc_max_deferred_entries_sizeconfiguration option sets the maximum size of deferred entries in the garbage collection queue. - The
rgw_gc_max_queue_sizeconfiguration option sets the maximum queue size that is used for garbage collection. Do not set this value greater thanosd_max_object_sizeminusrgw_gc_max_deferred_entries_sizeminus 1 KB. - The
rgw_gc_max_deferredconfiguration option sets the maximum number of deferred entries that are stored in the garbage collection queue.
HEALTH_ERR state if this happens. Aggressive settings for parallel
garbage collection tunables significantly delayed the onset of storage cluster fill in testing and
can be helpful for many workloads. Typical real-world storage cluster workloads are not likely to
cause a storage cluster fill primarily due to garbage collection.