Allocation of input and output resources
This section describes how the QoS controls work internally with reservation, limit, and weight allocation. The user is not expected to set these controls as the mClock profiles automatically set them. Tuning these controls can only be performed using the available mClock profiles.
The dmClock algorithm allocates the input and output (I/O) resources of the Ceph cluster in proportion to weights. It implements the constraints of minimum reservation and maximum limitation to ensure the services can compete for the resources fairly.
Currently, the mclock_scheduler operation queue divides Ceph services involving
I/O resources into following buckets:
-
client op: the input and output operations per second (IOPS) issued by a client. -
pg deletion: the IOPS issued by primary Ceph OSD. -
snap trim: the snapshot trimming-related requests. -
pg recovery: the recovery-related requests. -
pg scrub: the scrub-related requests.
The resources are partitioned using the following three sets of tags, meaning that the share of each type of service is controlled by these three tags:
-
Reservation
-
Limit
-
Weight
Reservation
The minimum IOPS allocated for the service. The more reservation a service has, the more resources it is guaranteed to possess, as long as it requires so.
For example, a service with the reservation set to 0.1 (or 10%) always has 10% of the OSD’s IOPS capacity allocated for itself. Therefore, even if the clients start to issue large amounts of I/O requests, they do not exhaust all the I/O resources and the service’s operations are not depleted even in a cluster with high load.
Limit
The maximum IOPS allocated for the service. The service does not get more than the set number of requests per second serviced, even if it requires so and no other services are competing with it. If a service crosses the enforced limit, the operation remains in the operation queue until the limit is restored.
0 (disabled), the service is not restricted by the
limit setting and it can use all the resources if there is no other competing operation. This is
represented as "MAX" in the mClock profiles.Weight
The proportional share of capacity if extra capacity or system is not enough. The service can use a larger portion of the I/O resource, if its weight is higher than its competitor’s.
W, then for a given class of
requests the next one that enters has a weight tag of 1/W and the previous weight
tag, or the current time, whichever is larger. That means, if W is too large and
thus 1/W is too small, the calculated tag might never be assigned as it gets a
value of the current time. Therefore, values for weight should be always under the number of
requests expected to be serviced each second.