Security on Cloud Pak for Data

IBM Cloud Pak® for Data supports several different mechanisms for securing your environment and your data.

Quick links

Secure engineering practices

Cloud Pak for Data follows IBM Security and Privacy by Design (SPbD). Security and Privacy by Design (SPbD) at IBM is a set of focused security and privacy practices, including vulnerability management, threat modeling, penetration testing, privacy assessments, security testing, and patch management.

For more information about the IBM Secure Engineering Framework (SEF) and SPbD, see the following resources:

Basic security features on Red Hat OpenShift Container Platform

Security is important to every enterprise, especially for organizations in the government, financial services, and healthcare sectors. Red Hat® OpenShift® Container Platform provides a set of security features to protect sensitive customer data with strong encryption controls and improve the oversight of access control across applications and the platform itself.

Cloud Pak for Data builds on the security features provided by Red Hat OpenShift Container Platform by creating service accounts and roles so that Cloud Pak for Data pods and users have the lowest level of privileges necessary. Cloud Pak for Data is also security hardened on Red Hat OpenShift Container Platform and is installed in a secure and transparent manner.

Red Hat OpenShift Container Platform uses security context constraints (SCCs) to enforce the security context of a pod or a container.
  • Most Cloud Pak for Data services use the restricted or restricted-v2 SCC. This SCC denies all host features and requires pods to run with a UID, an SELinux context that is scoped within the namespace.
  • Some Cloud Pak for Data services require custom SCCs.

Cloud Pak for Data is installed in Red Hat OpenShift Container Platform projects. Cloud Pak for Data inherits the SCCs, UID ranges, and SELinux-based controls on processes, memory, and file systems from the projects where the software is installed.

For more information, see Basic security features on Red Hat OpenShift Container Platform.

Authentication and authorization

By default, Cloud Pak for Data user records are stored in an internal LDAP. The initial setup of Cloud Pak for Data uses the internal LDAP. However, after you set up Cloud Pak for Data, it is recommended that you use an enterprise-grade password management solution, such as SAML SSO or an LDAP provider for password management.

User management
For more information, see the following resources:
Authorization
Cloud Pak for Data provides user management capabilities to authorize users. For more information, see Managing users.
Tokens and API keys
Bearer tokens
  • When a user signs in to Cloud Pak for Data, the platform automatically generates a bearer token and a cookie. The cookie marks the users session. The bearer token is cached in the platform and is automatically renewed based on the idle session timeout settings. The bearer token is removed from the cache when the token expires or when the user is logged out because of inactivity.
  • Cloud Pak for Data provides an encrypted bearer token in model deployment details that an application developer can use for evaluating models online with REST APIs. The token never expires and is limited to the model it is associated with.
API keys
JWT tokens
Internally, Cloud Pak for Data uses a JSON Web Token (JWT) to authenticate to:
Services
Some services support JWT authentication. Services that support JWT tokens can use the Cloud Pak for Data credentials to authenticate to the service. For more information, see:
Data sources
Some data sources support JWT authentication. When you create a connection to a data source that supports JWT tokens, you can select the Use my platform login credentials option to enable the connection to use the user's Cloud Pak for Data credentials for authentication.

When a user logs in to Cloud Pak for Data with their user name and password, Cloud Pak for Data returns a JWT token to the browser. The token is forwarded to the data source. The user does not need to enter credentials to access the data source.

The token expires based on the idle session timeout settings.

Idle web client session timeout
You can configure the length of time the user can be idle before their web client session expires in accordance with your security and compliance requirements. When a user leaves their session idle in a web browser for the specified length of time, the user is automatically logged out of the web client. You can optionally set a shorter session timeout for users with the Administer platform permission. For more information, see Setting the idle session timeout.
Concurrent session limit
You can specify the maximum number of concurrent sessions that Cloud Pak for Data users can have. A session is created each time the user logs in to Cloud Pak for Data. If the user does not log out of a session, they can end up with multiple, concurrent sessions. If you limit the number of concurrent sessions, a user's oldest session is automatically removed if the user reaches the limit. For more information, see Limiting the number of concurrent user sessions
Shared credentials for connections
By default, users can choose whether to use shared or personal credentials when they create a connection. (The default selection in the web client is Shared.) However, an instance administrator can turn off shared credentials to enforce the use of personal credentials.

With shared credentials, users with access to the connection are not prompted for credentials when they access the connection; therefore, you cannot determine who accessed the data. If you must comply with specific regulations to ensure security and individual accountability, an administrator can Disabling shared credentials.

Important: When you disable shared credentials, the setting affects only new connections. Existing connections are not affected.

If you want to prevent users from creating connections with shared credentials, change this setting before you give users access to Cloud Pak for Data.

Encryption

Cloud Pak for Data supports protection of data at rest and in motion.

Data
In general, data security is managed by your remote data sources. For more information about encryption, see Storage considerations.
To ensure that your data in Cloud Pak for Data is stored securely, you can encrypt your storage partition. For more information, see Encrypting and mirroring disks during installation in the Red Hat OpenShift Container Platform documentation:
Communications
You can use TLS or SSL to encrypt communications to and from Cloud Pak for Data.
  • If you plan to use your own TLS certificate and private key (both in PEM format) to enable an HTTPS connection to Cloud Pak for Data. For more information, see Using a custom TLS certificate for HTTPS connections to the platform.
    Best practice: Replace the default certificate that is used by the Red Hat OpenShift Container Platform ingress controller. For more information, see Setting a custom default certificate in the Red Hat OpenShift Container Platform documentation:

    You can use the same certificate for the ingress controller and the web client or you can use a different certificate for the Cloud Pak for Data route.

  • If you plan to use SSL for a IBM Db2 connection or IBM Db2 Warehouse connection, select the Use SSL option when you create the connection to the data source.
    Important: If the IBM Db2 database uses a self-signed certificate or a certificate that is signed by a local certificate authority, see Setting up a Db2 connection that uses TLS and SSL
  • If you plan to use SSL for an Apache Kafka connection, you need:

    • A truststore with certificate authority (CA) certificate installed.
    • A keystore with a key pair and certificate that is signed by CA.
In addition, it is recommended that you disable TLS 1.0 and TLS 1.1 from Red Hat OpenShift Container Platform HAProxy routers on port 443. For more information, see Disable TLS1.0 and TLS1.1 in HAproxy routers.
FIPS
Cloud Pak for Data supports FIPS (Federal Information Processing Standard) compliant encryption. For more information, see:

Network access requirements

To ensure secure transmission of network traffic to and from the Cloud Pak for Data cluster, you need to configure the communication ports used by the Cloud Pak for Data cluster.

Primary port
The primary port is what the Red Hat OpenShift router exposes. For more information, see Configuring the Ingress Controller in the Red Hat OpenShift Container Platform documentation:
Communication ports for services
When you provision a new service or integration on your Cloud Pak for Data cluster, the services might require connections to be made from outside the cluster.
For more information, see Securing communication ports.
DNS service name
When you install the Cloud Pak for Data control plane control plane, the installation points to the default Red Hat OpenShift DNS service name. If your OpenShift cluster is configured to use a custom name for the DNS service, a cluster administrator or instance administrator must update the DNS service name to prevent performance problems.
Network policies
You can use network policies to isolate the software on your cluster. By default, all the pods in a project can be accessed by other pods and network endpoints. An instance administrator can create network policies to specify the pods and network endpoints that a pod will allow incoming connections. Some Cloud Pak for Data services automatically create defensive network policies. For more information, see Network policies implemented by individual services.

Using an allowlist to prevent SSRF attacks

In a Server Side Request Forgery (SSRF) attack, an attacker can create requests from a vulnerable server. Typically, this happens when an application accepts URLs, IP addresses, or domain names from a user who has access to the server. The attacker can use this vulnerability to inject URLs with port details or with internal IP addresses, and then observe the internal network or enable the application to process malicious code.

The most robust way to avoid an SSRF attack is to set up an allowlist for the DNS name or IP address that your application needs to access. Alternatively, if you use a blocklist, it's important to validate the user input properly. For example, do not allow requests to private (nonroutable) IP addresses.

Multitenancy and network security

To make effective use of infrastructure and reduce operational expenses, you can run Cloud Pak for Data in multi-tenant mode on a single Red Hat OpenShift Container Platform cluster, while still maintaining security, compliance, and independent operability.

Security in a multi-tenant cluster is based on:
Best practice: Use groups to identify the projects that are associated with a specific instance of Cloud Pak for Data. You can use the groups when you create network policies. For more information, see Best practice: Creating groups to manage projects in a multitenant environment.

Audit logging

Audit logging provides accountability, traceability, and regulatory compliance. Cloud Pak for Data for data can be configured to forward auditable events to several security information and event management (SIEM) solutions. For more information, see Auditing Cloud Pak for Data.

Regulatory compliance

Cloud Pak for Data is assessed for various Privacy and Compliance regulations. Cloud Pak for Data provides features that can be used by its customers in preparation for various privacy and compliance assessments. These features are not an exhaustive list. It is difficult to assemble such an exhaustive list of features, since customers can choose and configure the features in many ways. Furthermore, Cloud Pak for Data can be used in various ways as a stand-alone product or with third-party applications and systems.

Cloud Pak for Data is not aware of the nature of data that it is handling other than at a technical level (for example, encoding, data type, size). Therefore, Cloud Pak for Data can never be aware of the presence or lack of personal data. Customers must track whether personal information is present in the data that is being used by Cloud Pak for Data.

For more information, see What regulations does Cloud Pak for Data comply with?

Additional security measures

To protect your Cloud Pak for Data instance, consider the following best practices.

Network isolation
As a best practice, use network policies to isolate the Red Hat OpenShift projects (namespaces) where Cloud Pak for Data is deployed. Ensure that only the appropriate services are accessible outside the project or outside the cluster.
For more information, see the following information in the Red Hat OpenShift documentation:
Setting up an elastic load balancer
To filter out unwanted network traffic, such as protecting against Distributed Denial of Service (DDoS) attacks, use an elastic load balancer that accepts only full HTTP connections. Using an elastic load balancer that is configured with an HTTP profile inspects the packets and forward only the HTTP requests that are complete to the Cloud Pak for Data web server.