Planning for AWS cloud
Learn about prerequisites, configurations, and other important planning information you must consider before deploying an IBM Storage Scale cluster on AWS.
- Preparing the AWS environment to deploy the IBM Storage Scale cluster.
- Planning the virtual network cloud (VPC) architecture.
- Creating an IBM Storage Scale Amazon Machine Image (AMI) in advance.
- Planning for the IBM Storage Scale deployment architecture on AWS.
- Planning IBM Storage Scale cluster deployment profiles.
- Determining your performance, scalability, data availability, and data protection requirements.
- Planning cloudkit notifications.
- Planning for encryption at rest.
Preparing the AWS environment
- Create an account in AWS, if you do not already have one.
- Create the necessary IAM users along with the access key required for cloudkit to make programmatic requests to the AWS API. For more information, see Managing access keys for IAM users.
- Use the region selector in the navigation bar to choose the AWS region to deploy the IBM Storage Scale cluster.
- Create a key pair in the preferred region. For more information, see Amazon EC2 Key Pairs.
- Verify that the desired AWS region and availability zone support the EC2 instance types that are chosen to be provisioned for the IBM Storage Scale storage and compute nodes. You can verify the AWS EC2 instance type from .
- Verify the AWS EC2 quota limits and ensure that enough quotas exist for the EC2 instance types
that you intend to deploy. Verify the AWS EC2 quota limits from
If necessary, request a service limit increase for the EC2 instance types that you intend to deploy from AWS website. To request a service limit increase, in the AWS Support Center, choose , and then complete the fields in the Limit Increase form.
. - Prepare the installer node where the cloudkit will run. For more information, see Preparing the installer node.
Planning the virtual private cloud (VPC) architecture for AWS
When deploying resources in the cloud, the cloudkit can either create a new virtual private cloud (VPC) and provision the resources into it or make use of a previously created VPC.
When the cloudkit creates a new VPC, the cloudkit designs your network infrastructure from scratch. It chooses the subnets, network address ranges, and security groups that best suits the IBM Storage Scale deployment.
? VPC Mode:[Use arrows to move, type to filter]
> New
Existing
- This is a mandatory step. DHCP options set or DNS configured.
- This is a mandatory step. Private subnets (with allocatable IP address) in the availability zone (minimum 1 private subnet per availability zone).
- This is an optional step. A public subnet (a subnet with internet gateway attached) is only
required if either the following cases are met:
- Where the cluster is planned to be accessed via a jump host. For more information, see Other considerations.
- If there is a need to pre-create an IBM Storage Scale AMI. For more information, see Pre-creating an IBM Storage Scale AMI.
- This is an optional step. A NAT gateway attached to the private subnet is only required when the IBM Storage Scale instances need access to the internet (for operating system updates and security patches, and so on).
Pre-creating an IBM Storage Scale AMI
When the cloudkit is used to create an IBM Storage Scale
cluster by using the cloudkit create cluster
command, it can either automatically
create a stock Amazon Machine Image (AMI) or the customer can provide a previously created custom
image.
The cloudkit can automatically create an image using Red Hat Enterprise Linux (RHEL) 8.8 and 9.2 versions. When the option to use a stock image is chosen, it will automatically perform all required actions to create an IBM Storage Scale and create the IBM Storage Scale cluster.
A custom image can be created using the cloudkit create image
command. The
create image
command can accept inputs in the form of an existing image or provides
the ability to create an image from a Red Hat OS image.
- A Red Hat version that is supported by IBM Storage Scale.
- This is an optional step. Any customer applications already preinstalled the cloudkit
create image
command installs all required IBM Storage Scale packages on top of this existing image.
Planning for the IBM Storage Scale deployment architecture on AWS
Before the IBM Storage Scale cluster can be created on the cloud, the deployment architecture must be planned.
- Combined-compute-storage: A unified IBM Storage Scale cluster with both storage and compute nodes. It is recommended that any customer workload runs only on the compute nodes.
- Compute-only: An IBM Storage Scale cluster that only consists of compute nodes. This cluster does not have any local filesystem and therefore must remote mount the filesystem from an IBM Storage Scale storage cluster.
- Storage-only: This IBM Storage Scale cluster consists of only storage nodes, which have access to storage where the filesystem is created. This filesystem is remote mounted to any number of compute only IBM Storage Scale clusters.
When you run the cloudkit create cluster command, the following prompt allows you to choose between the different offered deployment models:
? IBM Spectrum Scale deployment model:[Use arrows to move, type to filter, ? for more help]
> Storage-only
Compute-only
Combined-compute-storage
IBM Storage Scale cluster deployment profiles
- Throughput-Performance-Persistent-Storage: This profile uses a single availability zone
with persistent storage, which means that the file system device retains data after the instance is
stopped. In this mode, the number of storage nodes is calculated by cloudkit based on the provided
filesystem capacity. This mode uses
gp3
as disk type.Important: When you choose this profile, the file system is configured in such manner that the data is not replicated, only the metadata gets replicated across the instances. - Throughput-Performance-Scratch-Storage: This profile uses a single availability zone and
a placement group with
cluster
policy, which means that it packs instances close together inside an availability zone. This strategy enables workloads to achieve the low-latency network performance necessary for tightly-coupled node-to-node communication that is typical of high-performance computing (HPC) applications, with instance or temporary storage, meaning that the filesystem device loses data after the instance is stopped.In the mode, the number of storage instances will be limited to 10 and compute instances will be limited to 65.
Important: This profile uses local NVME SSD or instance storage, which offers high performance and low latency for data-intensive workloads. However, the data stored in this mode is volatile and can be lost if the instance is stopped or terminated. Therefore, it is recommended to take frequent backups.This profile must not be used for long term storage or if the data is not backed up elsewhere.
- Throughput-Advance-Persistent-Storage: This profile uses a single availability zone with
persistent storage, which means that the file system device retains data after the instance is
stopped. This mode is meant for storage capacity rather performance; the number of storage instances
is limited to 64, compute instances is limited to 65. This profile offers these disk types to choose
from:
gp2
,gp3
, andio1
.Important: When you choose this profile, the file system is configured in such manner that the data is not replicated, only the metadata gets replicated across the instances. - Balanced: This profile uses multiple (3) availability zones to deploy the IBM Storage Scale cluster into. In this mode, the number of storage
instances will be limited to 64. It offers a choice of disk types
gp2
,gp3
, andio1
.Important: The IBM Storage Scale filesystem will be configured in such a way that data and metadata will be replicated across availability zones.
? Tuning profile: [Use arrows to move, type to filter, ? for more help]
Throughput-Performance-Scratch-Storage
> Throughput-Performance-Persistent-Storage
Throughput-Advance-Persistent-Storage
Balanced
Determining your performance, scalability, data availability, and data protection requirements
Before deploying the IBM Storage Scale cluster, it is important to understand your requirements in terms of performance, scalability, data availability, and data protection. These criteria will determine what AWS instance types to use for the storage nodes, computes as well as what elastic block store types should be used.
The cloudkit provides the following choices for VM instances types:
IBM Storage Scale compute nodes:
c4.large | vCPU(2) | RAM (3.75 GiB) | Dedicated EBS Bandwidth (500 Mbps) | Networking Bandwidth (Moderate)
c4.xlarge | vCPU(4) | RAM (7.5 GiB) | Dedicated EBS Bandwidth (750 Mbps) | Networking Bandwidth (High)
c4.2xlarge | vCPU(8) | RAM (15 GiB) | Dedicated EBS Bandwidth (1000 Mbps) | Networking Bandwidth (High)
c4.4xlarge | vCPU(16) | RAM (30 GiB) | Dedicated EBS Bandwidth (2000 Mbps) | Networking Bandwidth (High)
c4.8xlarge | vCPU(36) | RAM (60 GiB) | Dedicated EBS Bandwidth (4000 Mbps) | Networking Bandwidth (10Gbps)
c6in.xlarge | vCPU(4) | RAM (8 GiB) | Dedicated EBS Bandwidth (Up to 20 Gbps)
c6in.2xlarge | vCPU(8) | RAM (16 GiB) | Dedicated EBS Bandwidth (Up to 20 Gbps)
IBM Storage Scale storage nodes:
- For the Throughput-Performance-Persistent-Storage and Balanced
profiles:
m6in.xlarge | vCPU(4) | RAM (16 GiB) | Dedicated EBS Bandwidth (Up to 20 Gbps) m6in.2xlarge | vCPU(8) | RAM (32 GiB) | Dedicated EBS Bandwidth (Up to 20 Gbps) m6in.4xlarge | vCPU(16) | RAM (64 GiB) | Dedicated EBS Bandwidth (Up to 20 Gbps) m6in.8xlarge | vCPU(32) | RAM (128 GiB) | Dedicated EBS Bandwidth (20 Gbps) c6in.xlarge | vCPU(4) | RAM (8 GiB) | Dedicated EBS Bandwidth (Up to 20 Gbps) c6in.2xlarge | vCPU(8) | RAM (16 GiB) | Dedicated EBS Bandwidth (Up to 20 Gbps) c6in.4xlarge | vCPU(16) | RAM (32 GiB) | Dedicated EBS Bandwidth (Up to 20 Gbps)
-
For the Throughput-Performance-Scratch-Storage profile:
i3.large | vCPU(2) | RAM (15.25 GiB) | Instance Storage (1 x 475GB NVMe SSD) | Networking Bandwidth (Up to 10 Gbps) i3.xlarge | vCPU(4) | RAM (30.50 GiB) | Instance Storage (1 x 950GB NVMe SSD) | Networking Bandwidth (Up to 10 Gbps) i3.2xlarge | vCPU(8) | RAM (61.00 GiB) | Instance Storage (1 x 1900GB NVMe SSD) | Networking Bandwidth (Up to 10 Gbps) i3.4xlarge | vCPU(16) | RAM (122 GiB) | Instance Storage (2 x 1900GB NVMe SSD) | Networking Bandwidth (Up to 10 Gbps) i3.8xlarge | vCPU(32) | RAM (244 GiB) | Instance Storage (4 x 1900GB NVMe SSD) | Networking Bandwidth (Up to 10 Gbps) i3.16xlarge | vCPU(64) | RAM (488 GiB) | Instance Storage (8 x 1900GB NVMe SSD) | Networking Bandwidth (Up to 25 Gbps)
For more information on choosing instance types, see choosing instance types in AWS documentation.
- gp2
- gp3
- io1
For more information on choosing appropriate EBS volumes, see Choosing the appropriate EBS volumes in Amazon Web Service documentation.
Planning cloudkit notifications
The following prompt offers the option of subscribing to an Amazon Simple Notification Service (Amazon SNS). A confirmation message is sent to the provided email address; to receive the cloudkit notifications, the user needs to confirm the subscription.
? Operator Email (Optional): cloudkit@example.com
Planning for encryption at rest
The cloudkit offers an easy and simplified way to enable EBS encryption used by IBM Storage Scale instances. For more information, see Amazon EBS encryption in AWS documentation.
? EBS Disk type: gp2
? Do you wish to encrypt boot and data volumes: Yes
? Block device encryption key (Key ID/key ARN/Alias Name/Alias ARN) (Optional): xxxxxx
For the ? Do you wish to encrypt boot and data volumes: prompt, you can:
-
Specify No. The root and data corresponding EBS volumes will remain unencrypted.
-
Specify Yes and do not provide a block device encryption key. The root and data corresponding to EBS volumes will be encrypted using the default key provided by AWS.
-
Specify Yes and provide a block device encryption key. The root and data corresponding to EBS volumes will be encrypted using the key you provided.
-
If the key used for encryption IBM Storage Scale EBS volumes is deleted, data cannot be retrieved.
-
Invalid key or user configured to execute cloudkit lacks permissions to read the key which will lead to failures.
Limitations
-
Throughput-Performance-Scratch-Storage
will not be encrypted via cloudkit. Since the disks provided in scratch profile instance type, Instance storage, are hardware encrypted internally using an XTS-AES-256 block cipher. Hence, it does not require external encryption. -
Encryption cannot be turned on or off after the EBS volume is created which means unencrypted volumes created through cloudkit cannot be encrypted later and vice versa.
-
Key rotation and life cycle management are responsibility of the users.
-
If the key used for encrypting IBM Storage Scale is deleted, data cannot be retrieved.