Which storage option is best for Db2 in Cloud environment
IBM Cloud provides elasticity and superior resource utilization of the infrastructure in a multi-tenancy or dedicated environment. IBM has the distinction of using the same framework for managing infrastructure and software in the managed environment (IBM Cloud) or at customers’ data center through IBM Cloud Private.
The three main storage options available is file (OS), block (SAN or raw), and object storage (API based, scalable and extensible). File-based storage is our legacy – suitable for small data – and can be shared across cluster using software-based sharing such as NFS (Linux), CIFS/Samba (Windows), AFP (Mac) and others. It is the most common option; however, it is not extensible and scalable.
The block storage solutions provided by SAN (IBM, EMC, NetApp and others) are best suited for structured data where multiple low latency read / write operations are required. Centralized storage systems provide features and functions for striping, snapshot, mirroring, encryption, compression, replication, deduplication etc. and are typically expensive. These block-based solutions are mostly used by Tier-1 applications supporting 24×7 and high-performance databases and applications. The SAN based block storage will continue to dominate the high-end databases and applications even when they move to cloud environment, either in their own data center or to a provider which provides these capabilities.
The software-defined VMware-based virtual SAN Hyper Converged Infrastructure (HCI) block storage solutions, supported by commodity hardware, have proliferated mid-tier application and databases and have proved to be cost effective. IBM Elastic Storage solution based on IBM’s Spectrum Scale (aka GPFS) is competitively and cost-effective software defined block storage solution that can scale to thousands of nodes and providing extreme capacity using very reliable IBM POWER systems. These solutions are used by top tier industry markets.
The object storage solutions such as IBM Cloud Object Storage (ICOS), Amazon S3, and Azure Storage are hierarchy-free method of storing data using a unique identifier in a flat address space and are used at the application level through an API. These Object storage solutions are mostly used for unstructured data where data is written once and accessed many times. The scalability of data is almost limitless. This solution is best suited for cloud unstructured data spanning multiple data centers and geography and is ideally suited for stateless cloud-native applications at scale.
The convergence of technology between block and object storage is happening where REST API support of block storage for applications, dynamic volume provisioning etc. is happening in similar fashion as REST API endpoint support is available for Object storage.
Provisioning of IBM Db2 database using a single node for transactional system has three paradigms.
Legacy Paradigm – This is typical and used across the enterprise where a DBA will request a machine (virtual / bare metal) with SAN storage. This process in a large enterprise can take anywhere from 2 to 10 weeks. DBA will then install the software, configure the database, create objects and then the application team will connect to the databases. This is legacy monolithic system tightly coupled with hardware and application. Enterprises are slowly moving away from this paradigm.
Contemporary Paradigm – Virtualization with vSAN capabilities, provided by VMware tooling through which a golden OS image with already installed Db2 software, is provisioned using HCI vSAN using commodity hardware in hours / days compared to weeks and months. These are typically not Tier-1 applications (which are still largely hosted in a monolithic environment) as the main premise of these solutions are hypervisor virtualization.
Cloud Native Paradigm – Using Kubernetes docker orchestration for optimum resource utilization (30-40%) of deploying Db2 in pods using Helm charts in a point and click fashion with Kubernetes DNS service to glue published IP addresses of the databases to the internal IP addresses of the pods where they can be deployed. In this paradigm, the plumbing of software installation, database creation, configuration and tuning are staged ahead of time through Helm charts and ready to be deployed on a click of a button. Since databases are stateful in Kubernetes cluster, the volume assignment can happen through GlusterFS or IBM Spectrum Scale with capabilities of snapshots, replication, mirroring, striping, encryption provided via software. Since this is still block storage for multiple read/write operations, the back-end storage can be simple direct attached flash drives to the servers or SAN based volumes. For locally attached flash drives, a virtual SAN for clustering can be achieved either through IBM Spectrum Scale or GlusterFS or vSAN of VMware. In case of a pod failure, Kubernetes will start the docker container into any available worker nodes, and there must be a facility through which the storage can be shared transparently across worker nodes. This is achieved through REST API endpoints managed by Heketi and consumed by Kubernetes.
The top three players, IBM, AWS and Microsoft Azure are using similar ingredients of the recipe, but the preparation methods are different. Redhat OpenShift, Nutanix, Rackspace, Kublr and many others are providing similar environment to the doorsteps of clients.
IBM is on the cusp of winning this game by the release of IBM Cloud Private in Oct 2017, where customers will get elasticity and speed in their own data centers. This is evident by the immense response IBM has received from its enterprise customers – who are now adopting IBM Cloud solutions as the main workhorse of the enterprise. The good news is that IBM middleware is going to be “Cloud Native” at a much faster speed than ever before.
Provisioning storage volumes and growth management either through VMware vSphere Cloud Provider, GlusterFS or IBM Spectrum Scale (GPFS) is attained through endpoint management, which gives the speed and elasticity to deploy the Db2 containers through Kubernetes on any available pod.
The shared storage can be used for Db2 shared everything (OLTP) or shared nothing (Warehouse) architecture which provided high availability and no data re-distribution in case of shared nothing architecture. This concept is used in IBM Db2 Warehouse offering for Cloud or on-premise.