Introducing hybrid cloud capability in IBM Spectrum Conductor with Spark 2.2.1
LiorAronovich 310002C4BT Visits (5827)
IBM Spectrum Conductor with Spark 2.2.1 introduces a new cloud bursting feature that enables you to dynamically burst workloads from an IBM Spectrum Conductor with Spark cluster to cloud hosts. When workload resource demand exceeds the capacity of resources in the cluster, additional cloud hosts are provisioned and added to the cluster to meet the resource demand. When there is excess capacity in allocated cloud hosts, this excess capacity is returned to the cloud providers.
The cloud bursting capability introduced in IBM Spectrum Conductor with Spark 2.2.1 provides several benefits:
IBM Spectrum Conductor with Spark 2.2.1 uses a system service called HostFactory to connect with cloud providers. This service runs within an IBM Spectrum Conductor with Spark cluster, and uses plug-ins to communicate with IBM Spectrum Conductor with Spark and with cloud providers.
The IBM Spectrum Conductor with Spark requestor plug-in monitors the workloads of Spark instance groups that are enabled for cloud bursting, and monitors the states and utilization of the cloud hosts in the cluster. Based on the information gathered, the plug-in calculates scale-out and scale-in requests, and provides these requests to the HostFactory service. Scale-out requests enable to add cloud hosts to the cluster, and are generated when workload demand exceeds resource capacity in the cluster. Scale-in requests enable to return cloud hosts to the cloud providers, and are generated when there is excess resource capacity in the cluster. The provider plug-ins work with cloud provider interfaces to provision and return hosts. In IBM Spectrum Conductor with Spark 2.2.1, the supported cloud providers are Amazon Web Services and IBM Cloud (SoftLayer).
High level architecture of the IBM Spectrum Conductor with Spark requestor plug-in
The HostFactory service uses cloud provider plug-ins to communicate with Infrastructure as a Service (IaaS) cloud services, provision cloud hosts, and return cloud hosts.
The responsibilities of the HostFactory service are:
To obtain requests for provisioning of cloud hosts and returning of cloud hosts, the HostFactory service activates the IBM Spectrum Conductor with Spark requestor plug-in (every 30 seconds by default).
The responsibilities of the IBM Spectrum Conductor with Spark requestor plug-in are:
The IBM Spectrum Conductor with Spark requestor plug-in communicates with the IBM Spectrum Conductor with Spark cluster services. It gets a list of Spark master services running in the cluster from the ASCD service of the IBM Spectrum Conductor with Spark cluster, and it gets lists and attributes of waiting and running Spark applications from the Spark master services running in the cluster. It also gets host level statistics from the resource management layer (EGO) of the IBM Spectrum Conductor with Spark cluster.
The IBM Spectrum Conductor with Spark requestor plug-in maintains workload profiles that facilitate the resource requirements calculations, and maintains host level information that facilitates the host return operations. This information is maintained in a metadata store.
A configuration file of the IBM Spectrum Conductor with Spark requestor plug-in enables you to customize parameters controlling the scale-out and scale-in calculations, and further administrative parameters. Online modification of this configuration is supported.
Configuring and activating the cloud bursting feature
Scale-out and scale-in configurations
The method for selecting workloads for cloud bursting based on their state can be configured with the following options:
For each workload class, the following can be configured:
These parameters can be specified either for a workload class or globally for any workload that does not have these parameters specifically configured.
A workload class is a defined group of workloads that have similar characteristics and processing behaviors. The IBM Spectrum Conductor with Spark requestor plug-in maintains a profiling record for each workload class, where the record aggregates statistics based on samples that are collected for workloads within that workload class. This information enables to calculate estimated resource requirements in the bursting calculations. This information includes an average processing duration and an average compute slots consumption for the workload class.
Billing cycle parameters, that specify billing cycle information for specified cloud providers or for the default case, can be configured. This information is used in the host return calculations to optimize resource utilization. The information includes the duration of the billing cycle, and the start and end times of the return window relative to the billing cycle duration. For cloud providers for which no billing cycle information is specified, the default information is used.
Monitoring and configuration from the cluster management console
You can view information on the cloud bursting activity and update configuration parameters by using the cluster management console:
For further information on the cloud bursting capabilities of IBM Spectrum Conductor with Spark 2.2.1 please refer to the Knowledge Center documentation.
If you want to try out IBM Spectrum Conductor with Spark 2.2.1, you can download the evaluation version here! If you have any questions about cloud bursting with host factory, or for any general inquiries, post them in our forum or join us on Slack!