Managing the platform

From the IBM® Cloud Pak for Data web client, you can monitor the services that are running on the platform, understand how you are using cluster resources, and be aware of issues as they arise. You can also set quotas on the platform and on individual services to help mitigate unexpected spikes in resource use.

Accessing the Platform management page

Required permissions:
To access the Platform management page, you must have one of the following permissions:
  • Administer platform
  • Manage and monitor platform
To access the Platform management page:
  1. Log in to the Cloud Pak for Data web client.
  2. From the navigation menu, select Administration > Platform management.
From the Platform management page, you can:

At-a-glance platform monitoring

From the Platform management page, you can see the status of the following items on the platform:
Available cards Status information Get more detailed information
Services

Services are software that is installed on the platform. Services consume resources as part of their regular operations.

From the Platform management page, you can see:
  • How many services are installed on the platform.
  • If there are any issues related to a service.

    If there is an issue, it means that at least one pod that is associated with the service is in a failed or unknown state.

Click the Services card to see:
  • The status (or health) of each service.
  • The number of service instances, environments, and jobs that are associated with the service (if applicable).
  • The current resource use for the service.
  • The vCPU quota status and the memory quota status (if set).
You can select a service to see:
  • The service quotas.
  • The pods that are associated with the service.
Service instances

Some services can be deployed multiple times after they are installed. Each deployment is called a service instance.

Service instances consume resources as part of their normal operations.

From the Platform management page, you can see:
  • How many service instances are deployed on the platform.
  • If there are any issues related to a service instance.

    If there is an issue, it means that at least one pod that is associated with a service instance is in a failed or unknown state.

Click the Service instances card to see:
  • The status (or health) of each service instance.
  • The service that the service instance is associated with.
  • Who provisioned the instance and when.
  • The number of users who have access to the service instance.
  • The number of pods associated with the service instance.
  • The current resource use for the service.

You can select a service instance to see the pods that are associated with the service instance.

Additionally, you can click the Options icon (Image of the Options icon.) for a service instance to:
  • Manage access to the instance
  • Delete the instance

However, to complete either of these tasks, you must be an administrator of the service instance or you must have the Administer platform permission.

Environments

Environments specify the hardware and software configurations for runtimes for analytical assets and jobs. Environments consume resources as part of their regular operations.

By default, this card is not displayed on the platform. It is displayed only if you install a service that uses environments.

From the Platform management page, you can see:
  • How many environments are currently running on the platform.
  • If there are any issues related to an environment.

    If there is an issue, it means that at least one pod that is associated with an environment failed.

Click the Environments card to see:
  • The status (or health) of each environment.
  • Who started the environment and when.
  • The project or deployment space where the environment is running.
  • The number of GPU requests
  • The current resource use for the environment.

You can select an environment to see the pods that are associated with the environment.

Additionally, you can optionally click the Stop runtime instance icon (Image of the Stop runtime instance icon) to stop the environment.

Pods

Services are composed of Kubernetes pods.

If a pod is failed or unknown, it can impact the health of the service. If a pod is pending, the service might not be able to process specific requests until the pod is running.

From the Platform management page, you can see:
  • How many pods associated with the platform.
  • If there are any issues related to pods.

    If there is an issue, it means that at least one pod is in a failed or unknown state.

  • If there are any pods that are pending.
    If a pod is pending, Kubernetes is attempting to create and schedule the pod. The pod might remain in the pending state if:
    • Kubernetes is waiting for a process to complete or doesn’t have sufficient resources to fulfill the pod request.
    • The platform or service quota settings are preventing new pods from starting.
Click the Pods card to see:
  • The status (or health) of each pod.
  • What service the pod is associated with.
  • The Red Hat® OpenShift® project (namespace) where the pod is running.
  • Whether the pod is associated with a fixed resource, service instance, or environment.
  • The function or application of the pod.
  • The service instance, job, project, or deployment space that the pod is associated with.
  • When the pod was created.
  • How many times the pod has restarted.
  • The current resource use for the pod.
Additionally, you can click the Options icon (Image of the Options icon.) for a pod to:
  • See the details of the pod
  • View the pod logs
  • Restart the pod

Setting and enforcing quotas

A quota is a way for you to specify the maximum amount of memory and vCPU you want the platform or a specific service to use. A quota is a target against which you can measure your actual memory and vCPU use. A quota acts as a benchmark to let you know when your vCPU or memory use is approaching or surpassing your target use.

Note: Setting a quota is not the same thing as scaling.

Scaling impacts the overall capacity of a service by adjusting the number of pods in the service. (You can also scale the Cloud Pak for Data control plane.) When you scale a service up, the service becomes more resilient. Additionally, the service might have increased parallel processing capacity.

Setting a quota on a service does not change the scale. Scale and quota are independent settings.

Refresh 2 or later In addition to setting a quota, you can optionally enable quota enforcement. When you enforce quotas, new pods cannot be created if the pods would push your use above your quota.

Important: To use quota enforcement, you must install the scheduling service.

The behavior of the quota enforcement feature depends on whether you set your quotas on pod requests or limits. (For an in-depth explanation of requests and limits, see Managing Resources for Containers in the Kubernetes documentation.)

Enforcing quotas on pod requests
A request is the amount of vCPU or memory that the pod expects to use as part of its normal operations.
When you set quotas on pod requests, you have more flexibility in how your resources are allocated:
  • If you enforce the platform quotas, the control plane and any services that are running on this instance of Cloud Pak for Data are prevented from creating new pods if the requests in the new pod would push the platform over either the platform memory quota or the vCPU quota. These pods remain in the pending state until there are sufficient resources available. However, the existing pods can use more memory or vCPU than the platform quota.
  • If you enforce a service quota, the service is prevented from creating new pods if the requests in the new pod would push the service over either the memory quota or the vCPU quota. These pods remain in the pending state until there are sufficient resources available. However, the existing pods can use more memory or vCPU than the service quota.
Enforcing quotas on pod limits
A limit is the absolute maximum amount of vCPU or memory that the pod can use. If the pod tries to consume additional resources, the pod is terminated. In most cases, the requested resources (the requests) are less than the limits.
When you set quotas on pod limits, you have more control over your resources:
  • If you enforce platform quotas, the control plane and any services that are running on this instance of Cloud Pak for Data are prevented from creating new pods if the limits in the new pods would push the platform over either the platform memory quota or the vCPU quota. These pods remain in the pending state until there are sufficient resources available. When you enforce platform quotas on pod limits, the quota is a cap on the total resources that existing pods can use.
  • If you enforce service quotas, the service is prevented from creating new pods if the limits in the new pod would push the service over either the memory quota or the vCPU quota. These pods remain in the pending state until there are sufficient resources available. When you enforce service quotas on pod limits, the quota is a cap on the total resources that the existing pods can use.

If you don't enforce quotas, the quota has no impact on the behavior of the platform or services. If you are approaching or surpassing your quota settings, it's up to you whether you want to allow processes to consume resources or whether you want to stop processes to release resources.

To set quotas:
  1. To set the platform quota:
    1. On the Platform management page, click Set platform quotas or Edit platform quotas.
    2. Select Monitor platform resource use against your target use.
    3. Specify whether you want to set quotas on pod Requests or Limits.
    4. Specify your vCPU quota. This is the target maximum amount of vCPU you want the platform to use.
    5. Specify your vCPU alert threshold. When you reach the specified percent of vCPU in use, the platform will alert you based on your alert settings
    6. Specify your Memory quota. This is the target maximum amount of memory you want the platform to use.
    7. Specify your Memory alert threshold. When you reach the specified percent of memory in use, the platform will alert you.
    8. If you want to automatically enforce the platform quota settings, select Enforce quotas.
    9. Click Save.
  2. To set service quotas
    1. On the Platform management page, click Edit service quotas.
    2. Locate the service for which you want to edit the quota, and click the Edit icon (Image of the Edit icon).
    3. Select Monitor platform resource use against your target use.
    4. Specify whether you want to set quotas on pod Requests or Limits.
    5. Specify your vCPU quota. This is the target maximum amount of vCPU you want the service to use.
    6. Specify your vCPU alert threshold. When you reach the specified percent of vCPU in use, the platform will alert you based on your alert settings
    7. Specify your Memory quota. This is the target maximum amount of memory you want the service to use.
    8. Specify your Memory alert threshold. When you reach the specified percent of memory in use, the platform will alert you.
    9. If you want to automatically enforce the platform quota settings, select Enforce quotas.
    10. Click Save.