Planning for GCP cloud

Learn about prerequisites, configurations, and other important planning information you must consider before deploying an IBM Storage Scale cluster on GCP.

Consider the following steps when planning for an IBM Storage Scale cluster on the public cloud of Google Cloud Platform (GCP):
  1. Preparing the GCP environment to deploy the IBM Storage Scale cluster.
  2. Planning the virtual network cloud (VPC) architecture.
  3. Creating an IBM Storage Scale virtual machine (VM) image in advance.
  4. Planning for the IBM Storage Scale deployment architecture on GCP.
  5. Planning for IBM Storage Scale cluster deployment profiles.
  6. Determining your performance, scalability, data availability, and data protection requirements.
  7. Planning for encryption at rest.

Preparing the GCP environment

Complete the following steps to prepare the GCP environment:
  1. Create a service account with sufficient privileges and quota to provision all the required resources along with the credentials to access they GCP API. For information about creating a service account from the GCP console, see Create a service account in GCP documentation.
  2. Make sure to have the next roles configured. These roles are required for the GCP service account to execute cloudkit.
    Artifact Registry Administrator 
    Cloud KMS CryptoKey Encrypter/Decrypter 
    Compute Instance Admin (v1) 
    Compute Network Admin 
    Compute Security Admin 
    DNS Administrator 
    Service Account User 
    Storage Admin
    Browser
    Note: Roles can be added or removed by following the procedures describe in GCP documentation.
  3. Verify the GCP quota limits and ensure that enough quotas exist for the GCP instance types that you intend to deploy. Consult the quota limits of the GCP instance from Google Cloud console > Quota page.

    If necessary for the GCP instance types that you intend to deploy, request a service limit increase. For more information, see Manage your quota using the Google Cloud console in GCP documentation.

  4. Prepare the installer node where the cloudkit will run. For more information, see Preparing the installer node.

Planning the virtual private cloud (VPC) architecture for GCP

When deploying resources in the cloud, the cloudkit can either create a new virtual private cloud (VPC) and provision the resources into it or make use of a previously created VPC.

When the cloudkit creates a new VPC, the cloudkit designs your network infrastructure from scratch. It chooses the subnets, network address ranges, and security groups that best suits the IBM Storage Scale deployment.

The following deployment options are available to users:

? VPC Mode:[Use arrows to move, type to filter]
> New
Existing
  • Deploy IBM Storage Scale into a new VPC with a single availability zone: This option builds a new GCP environment that consists of the VPC, subnets, firewall rules, bastion hosts, and other infrastructure components; then, it deploys IBM Storage Scale into this new VPC with a single availability zone.
  • Deploy IBM Storage Scale into an existing VPC: This option provisions IBM Storage Scale in your existing GCP infrastructure.
When deploying the IBM Storage Scale cluster in an existing VPC, the following requirements must be met:
  • This is a mandatory step. Private subnets (with allocatable IP address) in the availability region (minimum 1 private subnet per availability region).
  • This is an optional step. A public subnet (a subnet with internet gateway attached) is only required if either the following cases are met:
    1. Where the cluster is planned to be accessed via a jump host. For more information, see Other considerations .
    2. If there is a need to pre-create an IBM Storage Scale VM image. For more information, see Pre-creating an IBM Storage Scale VM image.
  • This is an optional step. A cloud NAT attached to the private subnet is only required when the IBM Storage Scale instances need access to the internet (for operating system updates and security patches, and so on).
  • This is an optional step. If you configured a cloud DNS, then you must provide cloud DNS information while deploying the cluster.
    ? Do you wish to use GCP Cloud DNS:  Yes

Pre-creating an IBM Storage Scale VM image

When the cloudkit is used to create an IBM Storage Scale cluster using the cloudkit create cluster command, it can either automatically create a stock VM image or the customer can provide a previously created custom image.

The cloudkit can automatically create an image using Red Hat Enterprise Linux (RHEL) 8 (latest minor version) and RHEL 9 (latest minor version). When the option to use a stock image is chosen, it will automatically perform all required actions to create an IBM Storage Scale and create the IBM Storage Scale cluster.

A custom image can be created using the cloudkit create image command. The create image command can accept inputs in the form of an existing image or provides the ability to create an image from a Red Hat operating system image.

A custom existing image consists of :
  1. A Red Hat version that is supported by IBM Storage Scale.
  2. This is an optional step. Any customer applications already pre-installed the cloudkit create image command will install all required IBM Storage Scale packages on top of this existing image.

Planning for the IBM Storage Scale deployment architecture on GCP

Before the IBM Storage Scale cluster can be created on the cloud, the deployment architecture must be planned.

The cloudkit offers the following deployment models for IBM Storage Scale clusters:
  • Combined-compute-storage: A unified IBM Storage Scale cluster with both storage and compute nodes. It is recommended that any customer workload runs only on the compute nodes.
  • Compute-only: An IBM Storage Scale cluster that only consists of compute nodes. This cluster does not have any local filesystem and therefore must remote mount the filesystem from an IBM Storage Scale storage cluster.
  • Storage-only: This IBM Storage Scale cluster only consists of storage nodes which have access to storage where the filesystem is created. This filesystem is remote mounted to any number of compute only IBM Storage Scale clusters.

When you run the cloudkit create cluster command, the following prompt allows you to choose between the different offered deployment models:


? IBM Spectrum Scale deployment model:[Use arrows to move, type to filter, ? for more help]
> Storage-only
Compute-only
Combined-compute-storage

IBM Storage Scale cluster deployment profiles

Cloudkit offers the following deployment profiles:
  • Throughput-Performance-Persistent-Storage: This profile uses a single availability zone with persistent storage, which means that the file system device retains data after the instance is stopped. In this mode, the number of storage instances is limited to 64, compute instances is limited to 65. This profile offers these disk types to choose from: pd-balanced, pd-standard, and pd-ssd.
    Important: When you choose this profile, the file system is configured in such manner that the data is not replicated, only the metadata gets replicated across the instances.
  • Throughput-Performance-Scratch-Storage: This profile uses a single availability zone and a placement group with cluster policy, which means that it packs instances close together inside an availability zone. This strategy enables workloads to achieve the low-latency network performance necessary for tightly-coupled node-to-node communication that is typical of high-performance computing (HPC) applications, with instance or temporary storage, meaning that the filesystem device loses data after the instance is stopped.

    In the mode, the number of storage instances will be limited to 10 and compute instances will be limited to 65.

    Important: This profile uses local NVME SSD or instance storage, which offers high performance and low latency for data-intensive workloads. However, the data stored in this mode is volatile and can be lost if the instance is stopped or terminated. Therefore, it is recommended to take frequent backups.

    This profile must not be used for long term storage or if the data is not backed up elsewhere.

  • Balanced: This profile uses multiple (3) availability zones to deploy the IBM Storage Scale cluster into. In this mode, the number of storage instances will be limited to 64. It offers a choice of disk types pd-balanced, pd-standard, and pd-ssd.
    Important: The IBM Storage Scale filesystem will be configured in such a way that data and metadata will be replicated across availability zones.
    Instances will be spread across first 2 AZ's opted in the selection, and the tie-breaker instance will be provisioned in 3AZ provided during the selection.
    ? Tuning profile: [Use arrows to move, type to filter, ? for more help]
      Throughput-Performance-Scratch-Storage
    > Throughput-Performance-Persistent-Storage 
      Balanced

Determining your performance, scalability, data availability, and data protection requirements

Before deploying the IBM Storage Scale cluster, it is important to understand your requirements in terms of performance, scalability, data availability, and data protection. These criteria will determine what GCP instance types to use for the storage nodes, computes as well as what elastic block store types should be used.

The cloudkit provides the following choices for VM instances types:

IBM Storage Scale compute nodes:

> n1-standard-2  | vCPU(2)  | RAM (7.50 GB) | Egress Network Bandwidth (Up to 10 Gbps)
  n2-standard-4  | vCPU(4)  | RAM (16.0 GB) | Egress Network Bandwidth (Up to 10 Gbps)
  e2-highmem-2   | vCPU(2)  | RAM (16 GB)   | Egress Network Bandwidth (Up to 4 Gbps)
  e2-highmem-4   | vCPU(4)  | RAM (32 GB)   | Egress Network Bandwidth (Up to 8 Gbps)
  e2-highmem-8   | vCPU(8)  | RAM (64 GB)   | Egress Network Bandwidth (Up to 16 Gbps)
  e2-highcpu-8   | vCPU(8)  | RAM (8 GB)    | Egress Network Bandwidth (Up to 16 Gbps)
  c2-standard-4  | vCPU(4)  | RAM (16 GB)   | Egress Network Bandwidth (Up to 10 Gbps)

IBM Storage Scale storage nodes:

  • For Throughput-Performance-Persistent-Storage and Balanced profiles:
    > n2-standard-2  | vCPU(2)  | RAM (8.0 GB)   | Egress Network Bandwidth (Up to 10 Gbps)
      n2-standard-4  | vCPU(4)  | RAM (16.0 GB)  | Egress Network Bandwidth (Up to 10 Gbps)
      n2-standard-8  | vCPU(8)  | RAM (32.0 GB)  | Egress Network Bandwidth (Up to 16 Gbps)
      n2-standard-16 | vCPU(16) | RAM (64.0 GB)  | Egress Network Bandwidth (Up to 32 Gbps)
      n2-standard-32 | vCPU(32) | RAM (128.0 GB) | Egress Network Bandwidth (Up to 50 Gbps)
      n2-standard-48 | vCPU(48) | RAM (192.0 GB) | Egress Network Bandwidth (Up to 50 Gbps)
  • For Throughput-Performance-Scratch-Storage profile:

    > n2-standard-32 | vCPU(32) | RAM (128.0 GB) | Egress Network Bandwidth (Up to 50 Gbps)
      n2-standard-48 | vCPU(48) | RAM (192.0 GB) | Egress Network Bandwidth (Up to 50 Gbps)

For more information on choosing instance types, see Machine families resource and comparison guide in GCP documentation.

The cloudkit provides the following choices for the disks that can be attached to the IBM Storage Scale storage nodes:
? Disk type:   [Use arrows to move, type to filter, ? for more help]
> pd-balanced
  pd-stndard
  pd-ssd

Planning for encryption at rest

The cloudkit offers an easy and simplified way to enable disk encryption used by IBM Storage Scale instances.

GCP provides built-in disk encryption options that you can enable during disk creation. Data is encrypted by default. The following encryption methods ante are the most common:
  • Google-managed encryption key. No configuration is required.
  • Customer-managed encryption key (CMEK). Managed via Google Cloud key management service.
  • Customer-supplied encryption key (CSEK). Managed outside of Google Cloud.
During the interactive method of cloudkit create cluster, the following questions related to encryption are shown:

? Data stored in boot and data disk(s) are encrypted automatically. Select an encryption key management solution:   [Use arrows to move, type to filter]
> Google-managed-encryption-key
  Customer-managed-encryption-key

You must select an encryption key management solution.

Note:
  • If the key used for encrypting IBM Storage Scale disk volumes is deleted, data cannot be retrieved.
  • An invalid key or user, one that was not configured to execute cloudkit, lacks permissions to read the key, which leads to failures.
  • CSEK (managed outside of Google Cloud) is currently not supported.

The Google-managed-encryption-key solution does not require any further configuration.

If you choose the Customer-managed-encryption-key solution, make the following configurations:
  • In these prompts, input a KMS key ring and a KMS key name:
    
    ? Data stored in boot and data disk(s) are encrypted automatically. Select an encryption key management solution:  Customer-managed-encryption-key
    ? Google Cloud key ring:  cloudkit-uscentral
    ? Customer-managed encryption key (CMEK):  cloudkit-key
  • To be able to read the key, ensure that the service user has viewer role for the cloudkms.cryptoKeyEncrypterDecrypter and the Cloud KMS.
  • Apart from the service user, the default compute engine service user also needs to be provided with cloudkms.cryptoKeyEncrypterDecrypter.
Note: If you choose this solution consider these aspects of its implementation:
  • When the cluster deployment configuration is storage only, both data and boot volumes are encrypted.
  • When the cluster deployment configuration is compute only, only boot volumes are encrypted.

Limitations

  • Both boot and data-related volumes are encrypted.
  • All data and boot volumes are encrypted using a single key.
  • Scratch or instance storage (local-ssd) disks are hardware-encrypted internally. Hence, they do not require external encryption.
  • CSEK is out of scope.
  • The KMS key ring must be local to the region of operation.