Planning for GCP cloud
Learn about prerequisites, configurations, and other important planning information you must consider before deploying an IBM Storage Scale cluster on GCP.
- Preparing the GCP environment to deploy the IBM Storage Scale cluster.
- Planning the virtual network cloud (VPC) architecture.
- Creating an IBM Storage Scale virtual machine (VM) image in advance.
- Planning for the IBM Storage Scale deployment architecture on GCP.
- Planning for IBM Storage Scale cluster deployment profiles.
- Determining your performance, scalability, data availability, and data protection requirements.
- Planning for encryption at rest.
Preparing the GCP environment
- Create a service account with sufficient privileges and quota to provision all the required resources along with the credentials to access they GCP API. For information about creating a service account from the GCP console, see Create a service account in GCP documentation.
- Make sure to have the next roles configured. These roles are required for the GCP service
account to execute cloudkit.
Artifact Registry Administrator Cloud KMS CryptoKey Encrypter/Decrypter Compute Instance Admin (v1) Compute Network Admin Compute Security Admin DNS Administrator Service Account User Storage Admin Browser
Note: Roles can be added or removed by following the procedures describe in GCP documentation. - Verify the GCP quota limits and ensure that enough quotas exist for the GCP instance types that
you intend to deploy. Consult the quota limits of the GCP instance from
If necessary for the GCP instance types that you intend to deploy, request a service limit increase. For more information, see Manage your quota using the Google Cloud console in GCP documentation.
. - Prepare the installer node where the cloudkit will run. For more information, see Preparing the installer node.
Planning the virtual private cloud (VPC) architecture for GCP
When deploying resources in the cloud, the cloudkit can either create a new virtual private cloud (VPC) and provision the resources into it or make use of a previously created VPC.
When the cloudkit creates a new VPC, the cloudkit designs your network infrastructure from scratch. It chooses the subnets, network address ranges, and security groups that best suits the IBM Storage Scale deployment.
? VPC Mode:[Use arrows to move, type to filter]
> New
Existing
- Deploy IBM Storage Scale into a new VPC with a single availability zone: This option builds a new GCP environment that consists of the VPC, subnets, firewall rules, bastion hosts, and other infrastructure components; then, it deploys IBM Storage Scale into this new VPC with a single availability zone.
- Deploy IBM Storage Scale into an existing VPC: This option provisions IBM Storage Scale in your existing GCP infrastructure.
- This is a mandatory step. Private subnets (with allocatable IP address) in the availability region (minimum 1 private subnet per availability region).
- This is an optional step. A public subnet (a subnet with internet gateway attached) is only
required if either the following cases are met:
- Where the cluster is planned to be accessed via a jump host. For more information, see Other considerations .
- If there is a need to pre-create an IBM Storage Scale VM image. For more information, see Pre-creating an IBM Storage Scale VM image.
- This is an optional step. A cloud NAT attached to the private subnet is only required when the IBM Storage Scale instances need access to the internet (for operating system updates and security patches, and so on).
- This is an optional step. If you configured a cloud DNS, then you must provide cloud DNS
information while deploying the
cluster.
? Do you wish to use GCP Cloud DNS: Yes
Pre-creating an IBM Storage Scale VM image
When the cloudkit is used to create an IBM Storage Scale
cluster using the cloudkit create cluster
command, it can either automatically
create a stock VM image or the customer can provide a previously created custom image.
The cloudkit can automatically create an image using Red Hat Enterprise Linux (RHEL) 8 (latest minor version) and RHEL 9 (latest minor version). When the option to use a stock image is chosen, it will automatically perform all required actions to create an IBM Storage Scale and create the IBM Storage Scale cluster.
A custom image can be created using the cloudkit create image
command. The
create image
command can accept inputs in the form of an existing image or provides
the ability to create an image from a Red Hat operating system image.
- A Red Hat version that is supported by IBM Storage Scale.
- This is an optional step. Any customer applications already pre-installed the cloudkit
create image
command will install all required IBM Storage Scale packages on top of this existing image.
Planning for the IBM Storage Scale deployment architecture on GCP
Before the IBM Storage Scale cluster can be created on the cloud, the deployment architecture must be planned.
- Combined-compute-storage: A unified IBM Storage Scale cluster with both storage and compute nodes. It is recommended that any customer workload runs only on the compute nodes.
- Compute-only: An IBM Storage Scale cluster that only consists of compute nodes. This cluster does not have any local filesystem and therefore must remote mount the filesystem from an IBM Storage Scale storage cluster.
- Storage-only: This IBM Storage Scale cluster only consists of storage nodes which have access to storage where the filesystem is created. This filesystem is remote mounted to any number of compute only IBM Storage Scale clusters.
When you run the cloudkit create cluster command, the following prompt allows you to choose between the different offered deployment models:
? IBM Spectrum Scale deployment model:[Use arrows to move, type to filter, ? for more help]
> Storage-only
Compute-only
Combined-compute-storage
IBM Storage Scale cluster deployment profiles
- Throughput-Performance-Persistent-Storage: This profile uses a single availability zone
with persistent storage, which means that the file system device retains data after the instance is
stopped. In this mode, the number of storage instances is limited to 64, compute instances is
limited to 65. This profile offers these disk types to choose from:
pd-balanced
,pd-standard
, andpd-ssd
.Important: When you choose this profile, the file system is configured in such manner that the data is not replicated, only the metadata gets replicated across the instances. - Throughput-Performance-Scratch-Storage: This profile uses a single availability zone and
a placement group with
cluster
policy, which means that it packs instances close together inside an availability zone. This strategy enables workloads to achieve the low-latency network performance necessary for tightly-coupled node-to-node communication that is typical of high-performance computing (HPC) applications, with instance or temporary storage, meaning that the filesystem device loses data after the instance is stopped.In the mode, the number of storage instances will be limited to 10 and compute instances will be limited to 65.
Important: This profile uses local NVME SSD or instance storage, which offers high performance and low latency for data-intensive workloads. However, the data stored in this mode is volatile and can be lost if the instance is stopped or terminated. Therefore, it is recommended to take frequent backups.This profile must not be used for long term storage or if the data is not backed up elsewhere.
- Balanced: This profile uses multiple (3) availability zones to deploy the IBM Storage Scale cluster into. In this mode, the number of storage
instances will be limited to 64. It offers a choice of disk types
pd-balanced
,pd-standard
, andpd-ssd
.Important: The IBM Storage Scale filesystem will be configured in such a way that data and metadata will be replicated across availability zones.Instances will be spread across first 2 AZ's opted in the selection, and the tie-breaker instance will be provisioned in 3AZ provided during the selection.? Tuning profile: [Use arrows to move, type to filter, ? for more help] Throughput-Performance-Scratch-Storage > Throughput-Performance-Persistent-Storage Balanced
Determining your performance, scalability, data availability, and data protection requirements
Before deploying the IBM Storage Scale cluster, it is important to understand your requirements in terms of performance, scalability, data availability, and data protection. These criteria will determine what GCP instance types to use for the storage nodes, computes as well as what elastic block store types should be used.
The cloudkit provides the following choices for VM instances types:
IBM Storage Scale compute nodes:
> n1-standard-2 | vCPU(2) | RAM (7.50 GB) | Egress Network Bandwidth (Up to 10 Gbps)
n2-standard-4 | vCPU(4) | RAM (16.0 GB) | Egress Network Bandwidth (Up to 10 Gbps)
e2-highmem-2 | vCPU(2) | RAM (16 GB) | Egress Network Bandwidth (Up to 4 Gbps)
e2-highmem-4 | vCPU(4) | RAM (32 GB) | Egress Network Bandwidth (Up to 8 Gbps)
e2-highmem-8 | vCPU(8) | RAM (64 GB) | Egress Network Bandwidth (Up to 16 Gbps)
e2-highcpu-8 | vCPU(8) | RAM (8 GB) | Egress Network Bandwidth (Up to 16 Gbps)
c2-standard-4 | vCPU(4) | RAM (16 GB) | Egress Network Bandwidth (Up to 10 Gbps)
IBM Storage Scale storage nodes:
- For Throughput-Performance-Persistent-Storage and Balanced
profiles:
> n2-standard-2 | vCPU(2) | RAM (8.0 GB) | Egress Network Bandwidth (Up to 10 Gbps) n2-standard-4 | vCPU(4) | RAM (16.0 GB) | Egress Network Bandwidth (Up to 10 Gbps) n2-standard-8 | vCPU(8) | RAM (32.0 GB) | Egress Network Bandwidth (Up to 16 Gbps) n2-standard-16 | vCPU(16) | RAM (64.0 GB) | Egress Network Bandwidth (Up to 32 Gbps) n2-standard-32 | vCPU(32) | RAM (128.0 GB) | Egress Network Bandwidth (Up to 50 Gbps) n2-standard-48 | vCPU(48) | RAM (192.0 GB) | Egress Network Bandwidth (Up to 50 Gbps)
-
For Throughput-Performance-Scratch-Storage profile:
> n2-standard-32 | vCPU(32) | RAM (128.0 GB) | Egress Network Bandwidth (Up to 50 Gbps) n2-standard-48 | vCPU(48) | RAM (192.0 GB) | Egress Network Bandwidth (Up to 50 Gbps)
For more information on choosing instance types, see Machine families resource and comparison guide in GCP documentation.
? Disk type: [Use arrows to move, type to filter, ? for more help]
> pd-balanced
pd-stndard
pd-ssd
Planning for encryption at rest
The cloudkit offers an easy and simplified way to enable disk encryption used by IBM Storage Scale instances.
- Google-managed encryption key. No configuration is required.
- Customer-managed encryption key (CMEK). Managed via Google Cloud key management service.
- Customer-supplied encryption key (CSEK). Managed outside of Google Cloud.
? Data stored in boot and data disk(s) are encrypted automatically. Select an encryption key management solution: [Use arrows to move, type to filter]
> Google-managed-encryption-key
Customer-managed-encryption-key
You must select an encryption key management solution.
- If the key used for encrypting IBM Storage Scale disk volumes is deleted, data cannot be retrieved.
- An invalid key or user, one that was not configured to execute cloudkit, lacks permissions to read the key, which leads to failures.
- CSEK (managed outside of Google Cloud) is currently not supported.
The Google-managed-encryption-key solution does not require any further configuration.
- In these prompts, input a KMS key ring and a KMS key
name:
? Data stored in boot and data disk(s) are encrypted automatically. Select an encryption key management solution: Customer-managed-encryption-key ? Google Cloud key ring: cloudkit-uscentral ? Customer-managed encryption key (CMEK): cloudkit-key
- To be able to read the key, ensure that the service user has viewer role for the cloudkms.cryptoKeyEncrypterDecrypter and the Cloud KMS.
- Apart from the service user, the default compute engine service user also needs to be provided with cloudkms.cryptoKeyEncrypterDecrypter.
- When the cluster deployment configuration is storage only, both data and boot volumes are encrypted.
- When the cluster deployment configuration is compute only, only boot volumes are encrypted.
Limitations
- Both boot and data-related volumes are encrypted.
- All data and boot volumes are encrypted using a single key.
- Scratch or instance storage (
local-ssd
) disks are hardware-encrypted internally. Hence, they do not require external encryption. - CSEK is out of scope.
- The KMS key ring must be
local
to the region of operation.