Changing instance resource quota in Analytics Engine Powered by Spark

When you submit jobs, the resources required by the jobs can sometimes exceed the resources that are available for the instance. You can control the resources, such as CPU and memory, that are used in a single Spark service instance by specifying a quota at the Analytics Engine Powered by Spark service instance level.

You can change the instance resource quota in the following way:

By using Cloud Pak for Data user interface. For more information on updating quota limits, see Updating instance resource quota limits using Cloud Pak for Data user interface.
By using the V4 instance quota APIs, which run on the Cloud Pak for Data Scheduler and take advantage of the queuing and priority setting capabilities provided by the scheduler. For information, see Queueing in Spark jobs V4 APIs.
By using the Spark jobs V3 API, where the instance level quota is enforced for each Spark service instance. By default, instances created through the Apache Spark user interface have the instance quota set to 20 CPU and 80 GB of memory.

V3 APIs for resource quota is deprecated and it will be removed in upcoming releases.

For a new instance, V3 instance quota volume is the default quota limit. To use V4 instance quota volume, you need to explicitly call V4 instance quota APIs. For more information, see Setting instance resource quota limits.

Working with resource quotas for an instance using the V4 API

To enable using the V4 resource quota APIs, the Cloud Pak for Data Scheduling service must be installed on the cluster and enabled to programmatically enforce the quotas that you set on a service instance.

Required services: The Scheduling service must be installed. You can install the scheduling service along with the Cloud Pak for Data platform. See Installing or upgrading the scheduling service.

Required role: You must be an Openshift administrator or Openshift project administrator to make changes to the Analytics Engine custom resource (CR).

You can enable using the V4 resource quota APIs in provisioned instances in one of two ways:

Either by using the following patch command:

oc patch AnalyticsEngine analyticsengine-sample --namespace ${PROJECT_CPD_INST_OPERANDS} --type merge --patch '{"spec":{"serviceConfig":{"schedulerForQuotaAndQueuing":"ibm-cpd-scheduler"}}}'

Wait about 5-10 minutes for the operator to reconcile and reach Completedstate.

Or by carrying out the following steps:
1. Log in to the Cloud Pak for Data cluster.
2. Update the spec.serviceConfig.schedulerForQuotaAndQueuing property in the Analytics Engine CR YAML file that was used to set up Analytics Engine powered by Apache Spark.
3. Then apply the changes to the existing deployed CR using the following command:
```
oc apply -f cr.yaml -n ${PROJECT_CPD_INST_OPERANDS}
```
4. Wait for the Analytics Engine CR to be in Completed state:
```
oc get analyticsengine -n ${PROJECT_CPD_INST_OPERANDS}
```

The following V4 resource quota APIs are available:

Set resource quota limits
Get the quota limits that were set
Get the quota consumption for the instance
Remove quota limits

Setting instance resource quota limits

To set resource quota limits on the instance that you create:

Get the instance ID:
1. From the Navigation menu on the IBM Cloud Pak for Data web user interface, click Services > Instances, find the instance and click it to view the instance details.
2. From the Spark jobs V4 endpoint, get the instance ID. The format of the endpoint is: https://<CloudPakforData_URL>/v4/analytics_engines/<INSTANCE_ID>/spark_applications
Generate a token if you haven't already done so. See Generating an access token.

Set the Cloud Pak for Data URL, instance ID and access token using the following commands:

export CPD_URL="<CloudPakforData_URL>"
export INSTANCE_ID="<INSTANCE_ID>"
export ACCESS_TOKEN="<ACCESS_TOKEN>"

Set your resource quota limits, for example:
```
curl -X PUT \
"$CPD_URL/v4/analytics_engines/$INSTANCE_ID/resource_consumption_limits" \
--header "Authorization: ZenApiKey ${TOKEN}" \
--header 'Content-Type: application/json' \
--data-raw '{
"max_cores": "50",
"max_memory": "50G"
}'
```
Here, "max_cores" is the maximum amount of cores the instance can consume at any given moment. "max_memory" is the maximum amount of memory the instance can consume at any given moment.

The ideal response for the previous cURL command is 200 (OK).

Note that it is your responsibility to manage the resource quota. It can be deleted, modified or overridden by the Cloud Pak for Data Scheduling service.

Getting the instance resource quota limits

To retrieve the resource quota limits, use the following command:

curl -X GET \
  "$CPD_URL/v4/analytics_engines/$INSTANCE_ID/resource_consumption_limits" \
  --header "Authorization: ZenApiKey ${TOKEN}"

The ideal response for the previous cURL command is 200 (OK).

Sample output:

{
  "max_cores": "50",
  "max_memory": "50G"
}

Getting instance resource quota consumption

To get the current resource quota consumption of a specific instance, use the following command:

curl -X GET \
  "$CPD_URL/v4/analytics_engines/$INSTANCE_ID/current_resource_consumption" \
  --header 'Authorization: ZenApiKey ${TOKEN}"

The ideal response for the above cURL command is 200 (OK).

Sample output:

{
  "running": {
    "cores": "2000mi",
    "memory": "2048Gi"
  },
  "pending": {
    "cores": "1000mi",
    "memory": "1024Gi"
  }
}

Deleting instance resource quota limits

Deleting the instance resource quota limits, removes the current instance quota and the Spark resources run on an unlimited quota.

To delete the instance resource quota:

curl -X DELETE \
  "$CPD_URL/v4/analytics_engines/$INSTANCE_ID/resource_consumption_limits" \
  --header "Authorization: ZenApiKey ${TOKEN}"

The ideal response for the previous cURL command is 204 (No Content).

Working with resource quotas for an instance using the V3 API (deprecated)

V3 APIs for resource quota is deprecated and it will be removed in upcoming releases.

You can see the total quota for an instance and edit resource quota details if required.

From the Spark jobs V3 endpoint, get the instance ID. See Managing Analytics Engine powered by Apache Spark instances. The format of the endpoint is: https://<CloudPakforData_URL>/v2/spark/v3/instances/<INSTANCE_ID>/spark/applications.

To edit the resource quota details of an existing instance to 200 CPU and 800 GB for example, use the following cURL command. Insert the instance ID you got from the Spark jobs endpoint:

curl -iX PUT -k -H "Content-Type: application/json" -H "Authorization: ZenApiKey ${TOKEN}" https://<CloudPakforData_URL>/v2/spark/v3/instances/<INSTANCE_ID>/resource_quota -d '{
        "cpu_quota": 200,
        "memory_quota_gibibytes": 800
}"

To get the total CPU and memory quota along with the available resources for that instance, use the following command:

curl -iX GET -ivk -H "Content-Type: application/json" -H "Authorization: ZenApiKey ${TOKEN}" https://<CloudPakforData_URL>/v2/spark/v3/instances/<INSTANCE_ID>

If you try to submit a job when there aren't enough instance resources available, the job API fails with a 400 Bad Request error indicating that the resources requested by the job exceeded the instance quota.

Updating instance resource quota limits using Cloud Pak for Data user interface

To update the resource quota limits, do the following steps:

Log in to the Cloud Pak for Data cluster.
From the Navigation menu on the IBM Cloud Pak for Data user interface, select Services > Instances, then select the Analytics Engine powered by Apache Spark instance.
In the Instance resource quota details section, click the Edit icon () to update the resource quota details.
Select any one of the following options from the Quota managed by field:
- Analytics Engine powered by Apache Spark: Indicates that the instance resource quota is managed by Analytics Engine powered by Apache Spark. Selecting this option internally invokes V3 API. You must have the pre-requisites for using the V3 resource quota APIs. For more information, see Working with resource quotas for an instance using the V3 API.
- Scheduling service: Indicates that the instance resource quota is managed by the scheduler. Selecting this option internally invokes V4 API. You must have the pre-requisites for using the V4 resource quota APIs. For more information, see Working with resource quotas for an instance using the V4 API.
Increase or decrease the CPU allocation (by using the + or - icon) from the CPU limit field.
Increase or decrease the memory allocation (by using the + or - icon) from the Memory limit (GB) field.
Click Save. The quota limits are saved.

Parent topic: Administering Analytics Engine powered by Apache Spark