Resource sharing

Watson Machine Learning Accelerator provides GPU resource sharing enforcement by setting the share ratio for the resource plans within a cluster.

Resource sharing refers to the temporary allocation of unowned resources from a share pool or cluster to a project with an unsatisfied requests. You can configure resource sharing for your projects by setting the share ratio, which specifies the ratio of resources to be allocated from the share pool or cluster.

Distribution model

By default, planned share ratios are enforced at the child resource plan level in a namespace. This means that existing share policies guarantee that each application (registered at a child resource plan level) receives its requested number of resources when needed. If an application does not have sufficient need to warrant receiving all its requested resources, the unused resources are distributed to all projects within the resource plan and filtered down to child resource plans as per their relative share ratios.

For example, if all applications (registered to child resource plans or projects) are configured with equal share ratios; each project has a 1:1 share of the resources distributed to its parent (top-level project in a resource plan). Assuming all applications have the same number of requests, they all receive the same number of resources.

Building on the example resource distribution model, if one of the applications no longer demonstrates a resource demand, its requested resources are distributed to other child resource plans in the project according to their configured share ratios (in this case, the share ratios are equal).

Reclaim behavior

If reclaim is triggered, a child resource plan takes back its requested number of resources in use by other projects, up to its planned share. For example, if a child project is experiencing an unmet demand, it reclaims resources directly from another child resource plan who is using more than its planned share of resources.

The first child resource plan reclaims without consideration of the needs or planned share ratio of the second project. The first child resource plan does not care if the project that it reclaims from falls below the project's deserved number of resources.