Storage considerations

To run stateful applications, developers need to store the persistent data in a managed storage that is backed by some physical storage. Volumes allow a state to persist across pods.

About this task

The supported storage providers for a production deployment of Cloud Pak for Business Automation include the IBM Storage Suite for IBM Cloud Paks. For more information, see IBM Documentation and Why IBM Storage Suite for IBM Cloud Paks.

The following table lists the capabilities disk space requirements for production deployments. Ranges are for small to large environments.

Note: For information about sizing and storage options for the foundational services, see Hardware requirements and recommendations and Storage options.

To pull all of the images from the IBM Entitled Registry, you need 300 - 500 GB of disk space. The actual size does depend on the registry type that you are using because some registries require more storage than other types of registry.

The following table provides storage requirements for the included capabilities and runtimes. Kubernetes access modes include Read Write Once (RWO), Read Write Many (RWX), and Read Only Many (ROX).

Table 1. Storage requirements
Capability or runtime Storage type Disk space Access mode Number of persistent volumes for non-HA/HA Posix compliance
Automation Decision Services [optional] mongo_embedded: File or Block

[optional] runtime storage: File

50 GB RWO

RWX

1 (non-HA)

1 (shared)

Posix compliance needed

Posix compliance not needed

Automation Document Processing File

Block (for mongo)

64 GB RWX
  • 1 for CDS
  • 3 for CDRA
  • 2 for CPDS
  • 7 for ViewOne
  • 3 for Gitgateway
  • 1 for mongo
  • (Optional) 2 for Document Processing engine
Posix compliance needed
Automation Workstream Services File 14 GB RWX
  • 1 for jms
  • 3 for BAW server
  • 2 for pfs
Posix compliance not needed
Business Automation Application File Application Engine: 50 MB RWX 1 Posix compliance not needed
Business Automation Insights File Flink: 20 GB
  • Flink using Elasticsearch snapshot storage: 30 GB
  • Flink using Elasticsearch data storage: 10 GB

Sizing depends on the size of the projects.

  • Kafka: 2 GB x Event rate x Event size x retention duration x 1.10 x replica
  • Flink: 10 GB + 2 x Event rate x event size x average duration of event x replica
  • Elasticsearch: Event rate x Event size x retention duration x (replicas +1)
RWX   Posix compliance not needed
Business Automation Navigator File   RWX 6 for BAN Posix compliance not needed
Business Automation Studio File

File

2 - 6 GB RWO

RWX

  • 1 for jms
  • 4 (playback, dump, index, log)
Posix compliance not needed
Business Automation Workflow with or without Automation Workstream Services File

File

64 GB RWO

RWX

Runtime
  • 1 for jms
  • 3 for BAW server
  • 2 for pfs
  • 3 for machine learning
Authoring
  • 1 for jms
  • 9 (3 for machine learning, 2 for pfs dump/log, 4 for authoring log/dump/index/file)
Posix compliance not needed
FileNet Content Manager File Component logs and configuration information: 500 MB each component.

Temporary working space: Workload dependent, with a minimum of 1 GB for each component.

CPE and CSS content and index stores: Size of the content elements your users store/index, which grows over time.

RWX
  • 6 for CPE
  • 5 for CSS
  • 2 for GraphQL
  • 3 for Task Manager (TM)
  • 2 for CMIS
  • 2 for ExtShare
Posix compliance not needed
Operational Decision Manager [optional] File For product customization

[optional] DB support to retrieve jdbc driver

10 - 100 GB

If the database is outside the cluster, the size of internal storage is reduced to 1 GB.

RWX

RWX

1 (non-HA)

1 (non-HA)

Posix compliance not needed

Posix compliance not needed

Resource Registry File Resource Registry: 5 - 16 GB RWX 1 Posix compliance not needed
Workflow Process Service Authoring File 2 – 6 GB RWO or RWX
  • 1 for jms
  • 1 for EDB PostgreSQL if you are using it
  • 4 for authoring file, dump, index, log
Posix compliance not needed

Persistent storage can be defined as static persistent volumes or dynamic storage classes. The storage classes and persistent volumes describe the type of storage to use, which is then configured for an application by users of the cluster.

Persistent volumes
A cluster administrator defines and creates a persistent volume (PV) by providing the cloud infrastructure with the details of the implementation of the storage. That storage can be a number of different types, including a Network File System (NFS) or a cloud-specific storage system.

For more information, see Persistent volumes.

File system security permissions are needed to secure the Kubernetes environment for the Cloud Pak and allow workloads to access storage. Access modes describe how the nodes access the storage. Note some default storage classes support ReadWriteOnce (RWO). Cloud Pak for Business Automation needs ReadWriteMany (RWX) for volume access.

Storage classes
A StorageClass object describes and classifies dynamically provisioned storage that can be requested on demand. The objects can also be used to manage and control access to the storage. Cluster administrators define and create the objects that users can request without needing to know all of the details about the underlying storage sources.

Installation needs two storage class names. One that is associated with the dynamic storage you plan to use for file-based ReadWriteMany (RWX), and another for block storage for ReadWriteOnce (RWO). Your storage must have sufficient space for the deployment. For more information, see Storage Classes.

Storage classes must be POSIX-compliant, such as when used with NFS or a Common Internet File System (CIFS).

For more information about storage class parameters, see Product Documentation for Red Hat OpenShift Container Storage. For example, to allow a deployment to be deleted and redeployed without losing the data and files that are created by a deployment use reclaimPolicy: Retain. For cloud platforms where a file system group owner is needed, use gidAllocate: "true" to request one.

Important: If you plan to use Portworx storage on ROKS in a multi-zone region (MZR), use the portworx-shared-sc storage class. You cannot use the portworx-db-sc storage class. Task Manager and Aspera integration with Business Automation Navigator does not work with Portworx storage in an MZR as it might take up to an hour to have the applications available again if one zone of the MZR is down.

To use a persistent volume or a pool of storage that is defined by a storage class, a persistent volume claim (PVC) is needed to consume the storage resources. A PVC is a claim for storage by a user that can include requests for a specific size and access modes.

  • If static provisioning is used, the PV and PVC must be created in the cluster and the PVC name is specified in the custom resource.

    You can create a pool of volumes that can be used by many different workloads. Leave it up to persistent volume claim to bind to one from the pool. You can create different pools of storage by using PV labels and PVC selectors.

  • If dynamic provisioning is used, three storage classes are needed to meet the "slow", "medium", and "fast" storage for the Cloud Pak components. You must make sure "slow", "medium", and "fast" storage class exist on the cluster. The PVC names that are specified in the custom resource are used when the claim is created.

    The deployment script on a private OpenShift (OCP) cluster needs a storage class name that the installer can use. The administrator must make a note of the class to use, and provide this name to the user who runs the deployment script. If you do not have three storage classes or you do not want to create them, you can use the same one for "slow", "medium", and "fast".

    Note: When dynamic provisioning is used, it does not support labels and selectors. PV is automatically bound to the PVC.

Example YAML files to create storage classes on Red Hat OpenShift Kubernetes Service (ROKS) are provided in the cert-kubernetes/descriptors folder. For more information about downloading cert-kubernetes, see Preparing a client to connect to the cluster.

Note: You can get the existing storage classes in the environment by running the following command:
oc get storageclass

Take note of the storage classes that you want to use for your deployment.

Important: External databases can be used by many of the Cloud Pak components to persist data. You must provision the database instances and make sure that they are accessible from the cluster, or reuse existing database instances. To improve performance, reduce as much as possible the latency between the applications or containers and the database server.