Planning for Azure cloud
Learn about prerequisites, configurations, and other important planning information you must consider before deploying an IBM Storage Scale cluster on Azure.
- Preparing the Azure environment to deploy the IBM Storage Scale cluster.
- Planning the virtual network (VNET) architecture.
- Planning for domain name system (DNS).
- Planning for bastion.
- Creating an IBM Storage Scale virtual machine (VM) image in advance.
- Planning for the IBM Storage Scale deployment architecture on Azure.
- Planning for IBM Storage Scale cluster deployment profiles.
- Determining your performance, scalability, data availability, and data protection requirements.
- Planning for encryption at rest.
Preparing the Azure environment
- Create a service principle by using the az ad sp create-for-rbac
command.To consult the credentials, issue the following command:
az ad sp create-for-rbac --query "{ client_id: appId, client_secret: password, tenant_id: tenant }"
The following output is an example of the previous command:{ "client_id": "00000000-0000-0000-0000-34XXXXXXX", "client_secret": "00000000-0000-0000-0000-50XXXXXXX", "tenant_id": "00000000-0000-0000-0000-40XXXXXXX", }
To authenticate on Azure, you also need to obtain your Azure subscription ID. Issue the az account show command to consult this ID:az account show --query "{ subscription_id: id }"
The following output is an example of the previous command:{ "subscription_id": "00000000-aea2-4177-a0f1-7117aXXXX" }
Remember: Make a note of the next values because they are required during the configuration operations of cloudkit:- client_id
- client_secret
- tenant_id
- subscription_id
- Make sure to configure at least the
Contributor
role. These roles are required for the Azure service principle to run cloudkit. - Verify the Azure quota limits and ensure that enough quotas exist for the Azure instance types that you intend to deploy. To consult the quota limits of the Azure instance, see Azure Quotas documentation.
- Prepare the installer node where the cloudkit will run. For more information, see Preparing the installer node.
- Optionally, create a resource group that holds all Azure resources that are specific for IBM Storage Scale cluster. For more information, see Create resource groups.
Planning the virtual network (VNET) architecture for Azure
When deploying resources in the cloud, the cloudkit can either create a new virtual network (VNET) and provision the resources into it or use a previously created VNET.
When the cloudkit creates a new VNET, the cloudkit designs your network infrastructure from scratch. It chooses the subnets, network address ranges, and security groups that best suits the IBM Storage Scale deployment.
A VNET can be created by using the cloudkit create network command, which creates only the VNET. Or you can use the cloudkit create cluster command.
? VPC Mode:[Use arrows to move, type to filter]
> New
Existing
- Deploy IBM Storage Scale into a new VNET with a single availability zone: This option builds a new Azure environment that consists of the VNET, private and public subnets, network security groups, application security groups, security rules, bastion hosts, and other infrastructure components; then, it deploys IBM Storage Scale into this new VNET with a single availability zone.
- Deploy IBM Storage Scale into an existing VNET: This option provisions IBM Storage Scale in your existing Azure infrastructure.
- This is a mandatory step. Private subnets (with allocatable IP address range) in the availability region (minimum 1 private subnet per availability region).
- This is an optional step. A public subnet is
only required if either of the following cases is met:
- Where the cluster is planned to be accessed via a jump host. For more information, see Other considerations.
- If there is a need to precreate an IBM Storage Scale VM image. For more information, see Precreating an IBM Storage Scale VM image.
- This is an optional step. A cloud NAT attached to the private subnet is only required when the IBM Storage Scale instances need access to the internet (for operating system updates and security patches, and so on).
- If you configured an Azure Private DNS, you must provide the Azure Private DNS information while
deploying the
cluster..
? Do you wish to use Azure Cloud DNS: Yes
- Make sure to select the appropriate resource group that is aimed to deploy the IBM Storage Scale cluster.
Planning for domain name system (DNS)
To facilitate hostname resolution, use the ./cloudkit create dns command to create or associate a DNS.
? Do you wish to use an existing DNS zone: (y/N)
E: No existing DNS zones found. In order to configure, use 'cloudkit create dns'.
Planning for bastion
Use the cloudkit create jumphost command to create a jump host or bastion in the public subnet.
? Bastion OS: RHEL-HA | RedHat | 8_8-gen2 | 8.8.2023121916 | azureuser
? Bastion instance type: Standard_B2s | vCPU(2) | RAM (4.0 GB)
? Do you wish to continue by creation of jumphost ssh key pair: Yes
I: Bastion instance(s) private key path: /user/clouduser/bastion_ssh_keys/id_rsa
? Bastion CIDR allow list: xxx.xxx.xxx.20/32
Standard_B2s | vCPU(2) | RAM (4.0 GB)
Standard_B2s_v2 | vCPU(2) | RAM (8.0 GB)
Standard_B4s_v2 | vCPU(4) | RAM (16.0 GB)
Precreating an IBM Storage Scale VM image
When the cloudkit is used to create an IBM Storage Scale
cluster by using the cloudkit create cluster
command, it can either automatically
create a stock VM image or the customer can provide a previously created custom image.
The cloudkit can automatically create an image by using the latest version of Red Hat Enterprise Linux (RHEL) 8 or RHEL 9. When the option to use a stock image is chosen, it will automatically perform all required actions to create an IBM Storage Scale and create the IBM Storage Scale cluster.
A custom image can be created by using the cloudkit create image
command. The
create image
command can accept inputs in the form of an existing image or provides
the ability to create an image from a Red Hat operating system image.
- A Red Hat version that is supported by IBM Storage Scale.
- This is an optional step. Any customer applications already preinstalled the cloudkit
create image
command will install all required IBM Storage Scale packages on top of this existing image. - Up to two virtual machine images.
- An image that is a logical volume management (LVM) partition.
Planning for the IBM Storage Scale deployment architecture on Azure
Before the IBM Storage Scale cluster can be created on the cloud, the deployment architecture must be planned.
- Combined-compute-storage: A unified IBM Storage Scale cluster with both storage and compute nodes. It is recommended that any customer workload runs only on the compute nodes.
- Compute-only: An IBM Storage Scale cluster that consists only of compute nodes. This cluster does not have any local file system and therefore must remote mount the file system from an IBM Storage Scale storage cluster.
- Storage-only: This IBM Storage Scale cluster consists only of storage nodes that have access to storage where the file system is created. This file system is remotely mounted to any number of compute-only IBM Storage Scale clusters.
When you run the cloudkit create cluster command, the following prompt asks the user to select from among the different deployment models that are offered:
? IBM Spectrum Scale deployment model:[Use arrows to move, type to filter, ? for more help]
> Storage-only
Compute-only
Combined-compute-storage
IBM Storage Scale cluster resource group
Azure resource group plays an important role during the deployment of an IBM Storage Scale cluster, because cluster deployment depends on resources such as VNET and DNS, which are tied to a single resource group. When you are deploying an IBM Storage Scale cluster, select the same resource group to access all the dependent group resources.
Cloudkit provides option to choose among existing resource group to deploy IBM Storage scale cluster on Azure.
When you run the cloudkit create cluster command, following cloudkit create cluster prompt asks the user to select from among the different available resource group.
? Resource Group: scale-cluster
IBM Storage Scale cluster deployment profiles
- Throughput-Performance-Persistent-Storage: This profile uses a single availability zone
with persistent storage, which means that the file system device retains data after the instance is
stopped. In this mode, the number of storage nodes is calculated by cloudkit based on the provided
file system capacity. The number of storage instances is limited to 64, compute instances are
limited to
65.Important: When you choose this profile, the file system is configured in such manner that the data is not replicated, only the metadata gets replicated across the instances.
- Throughput-Performance-Scratch-Storage: This profile uses a single availability zone and
a proximity placement group, which means that it packs instances close together inside an
availability zone. This strategy enables workloads to achieve the low-latency network performance
necessary for tightly coupled node-to-node communication that is typical of high-performance
computing (HPC) applications, with instance or temporary storage, meaning that the file system
device loses data after the instance is stopped.
In this mode, the number of storage instances will be limited to 10 and compute instances will be limited to 65.
Important: This profile uses temporary storage (NVMe), which offers high performance and low latency for data-intensive workloads. However, the data that is stored in this mode is volatile and can be lost if the instance is stopped or terminated. Therefore, it is recommended to take frequent backups.This profile must not be used for long-term storage or if the data is not backed up elsewhere.
- Throughput-Advance-Persistent-Storage: This profile uses a single availability zone with
persistent storage, which means that the file system device retains data after the instance is
stopped. This mode is meant for storage capacity rather performance; the number of storage instances
are limited to 64, compute instances is limited to 65. This profile offers these disk types to
choose from:
Standard_LRS
,StandardSSD_LRS
,Premium_LRS
, andPremiumV2_LRS
.Important: When you choose this profile, the file system is configured in such manner that the data is not replicated, only the metadata gets replicated across the instances. - Balanced: This profile uses multiple (3) availability zones to deploy the IBM Storage Scale cluster into. In this mode, the number of storage
instances will be limited to 64. It offers a choice of disk types
Standard_LRS
,StandardSSD_LRS
,Premium_LRS
, andPremiumV2_LRS
.Important: The IBM Storage Scale file system will be configured in such a way that data and metadata will be replicated across availability zones.Instances will be spread across first 2 availability zones opted in the selection, and the tie-breaker instance will be provisioned in 3 availability zones provided during the selection.? Tuning profile: [Use arrows to move, type to filter, ? for more help] Throughput-Performance-Scratch-Storage > Throughput-Performance-Persistent-Storage Throughput-Advance-Persistent-Storage Balanced
Determining your performance, scalability, data availability, and data protection requirements
Before deploying the IBM Storage Scale cluster, it is important to understand your requirements in terms of performance, scalability, data availability, and data protection. These criteria determine what Azure instance types to use for the storage nodes, computes as well as what elastic block store types should be used.
The cloudkit provides the following choices for VM instances types:
IBM Storage Scale compute nodes:
Standard_F8s_v2 | vCPU(8) | RAM (16.0 GB) | Max network bandwidth (12500 Mbps)
Standard_F16s_v2 | vCPU(16) | RAM (32.0 GB) | Max network bandwidth (12500 Mbps)
Standard_F32s_v2 | vCPU(32) | RAM (64.0 GB) | Max network bandwidth (16000 Mbps)
IBM Storage Scale storage nodes:
- For Throughput-Performance-Persistent-Storage and Balanced
profiles:
Standard_F8s_v2 | vCPU(8) | RAM (16.0 GB) | Max network bandwidth (12500 Mbps) Standard_F16s_v2 | vCPU(16) | RAM (32.0 GB) | Max network bandwidth (12500 Mbps) Standard_F32s_v2 | vCPU(32) | RAM (64.0 GB) | Max network bandwidth (16000 Mbps)
-
For Throughput-Performance-Scratch-Storage profile:
Standard_L8s_v3 | vCPU(8) | RAM (64) | Instance Storage (1 x 1.92 TB NVMe Disk) | Expected network bandwidth (12500 Mbps) Standard_L16s_v3 | vCPU(16) | RAM (128) | Instance Storage (2 x 1.92 TB NVMe Disk) | Expected network bandwidth (12500 Mbps) Standard_L32s_v3 | vCPU(32) | RAM (256) | Instance Storage (4 x 1.92 TB NVMe Disk) | Expected network bandwidth (16000 Mbps) Standard_L48s_v3 | vCPU(48) | RAM (384) | Instance Storage (6 x 1.92 TB NVMe Disk) | Expected network bandwidth (24000 Mbps) Standard_L64s_v3 | vCPU(64) | RAM (512) | Instance Storage (8 x 1.92 TB NVMe Disk) | Expected network bandwidth (30000 Mbps) Standard_L80s_v3 | vCPU(80) | RAM (640) | Instance Storage (10 x 1.92 TB NVMe Disk) | Expected network bandwidth (32000 Mbps)
- For
Throughput-Advance-Persistent-Storage:
Standard_F8s_v2 | vCPU(8) | RAM (16.0 GB) | Max network bandwidth (12500 Mbps) Standard_F16s_v2 | vCPU(16) | RAM (32.0 GB) | Max network bandwidth (12500 Mbps) Standard_F32s_v2 | vCPU(32) | RAM (64.0 GB) | Max network bandwidth (16000 Mbps)
For more information on choosing instance types, see Virtual Machine series in Azure documentation.
? Disk type: [Use arrows to move, type to filter, ? for more help]
> Standard_LRS
StandardSSD_LRS
Premium_LRS
PremiumV2_LRS
Planning for encryption at rest
The cloudkit offers an easy and simplified way to enable disk encryption used by IBM Storage Scale instances.
- Platform-managed encryption key. No configuration is required. By default disks are automatically encrypted-at-rest with platform-managed keys
- Customer-managed encryption key (CMEK). Managed via Azure Key Vault.
To enable encryption at rest, you need Key Vault Crypto Service Encryption permissions, which rely on encryption set. Key vault and key creation steps are out of scope.
- Azure Virtual Machines for deployment
- Azure Resource Manager for template deployment
- Azure Disk Encryption for volume encryption
? Data stored in boot and data disk(s) are encrypted automatically. Select an encryption key management solution: [Use arrows to move, type to filter]
> Platform-managed-encryption-key
Customer-managed-encryption-key
You must select an encryption key management solution.
- If the key used for encrypting IBM Storage Scale disk volumes is deleted, data cannot be retrieved.
- An invalid key or user, one that was not configured to run cloudkit, lacks permissions to read the key, which causes failures.
The Platform-managed-encryption-key solution does not require any further configuration.
- In these prompts, input a key name for the Azure Key
Vault:
? Data stored in boot and data disk(s) are encrypted automatically. Select an encryption key management solution: Customer-managed-encryption-key ? Cloud Vault Name: scalettest1 ? Customer-managed encryption key (CMEK): scalekey1
- When the cluster deployment configuration is storage only, both data and boot volumes are encrypted.
- When the cluster deployment configuration is compute only, only boot volumes are encrypted.
Limitations
- Both boot and data-related volumes are encrypted.
- All data and boot volumes are encrypted using a single key.
- Temporary disks or scratch storage disks are not encrypted while using
Customer-managed-encryption-key
. - The Azure Key Vault must share the same resource group that is used for IBM Storage Scale cluster creation.
- You cannot use the cloudkit command in the Azure Cloud VM.