I was recently asked to participate in a few HPC cloud projects which brought me back to my HPC cluster days. With my cloud experience of today it is quite clear to me that HPC and cloud share many attributes. HPC is cloud frankly, no doubt about it.
HPC cluster implementations are large deployments of compute nodes, traditionally commodity x86 servers that service large computational workloads found in research, manufacturing and healthcare sectors among others. All of this compute capacity is pooled together to offer supercomputing power to crunch huge datasets. Workloads that can take weeks to complete on stand-alone systems can be completed in days and sometimes hours.
As an example, let’s take a public research cluster and compare it to cloud (IaaS):
- Provisioning: Both cloud and HPC implementations need to offer rapid provisioning of resources to accommodate various and ever changing workloads. HPC environments tend to be bare metal deployments whereas cloud commonly utilizes a virtualized environment. HPC is all about performance and involving a hypervisor introduces an additional layer that can hinder performance. Both environments use tools to rapidly provision compute nodes or virtual servers.
- Multi-tenancy: Many research clusters will serve different tenants each requiring specific compute capacity and specific software stacks. Various job schedulers such as TORQUE and LSF will manage job queues and provide the necessary capacity based on specific criteria. Cloud implementations will also serve a multitude of clients and manage access to resources.
- Elasticity: Both HPC and cloud need to manage resource demands and provide a mechanism to release unused capacity to pending requests. Applications used to manage these environments may differ but the results are similar.
- Self-Service Portal: Cloud has the ability to offer a web based portal allowing users to request and ultimately provision capacity. HPC environments offer similar functionality however the standard is a mix of custom coded web portals or command-line based interfaces. Cloud usually offers a well-defined web based self-service portal to access capacity and issue service requests.
- I could go on however the biggest contrast resides in HPC being a niche environment, much of the management was left to Linux administrators relying on ad-hoc tools or open sources community based programs to run their clusters (xCAT as an example). With the advent of cloud management applications, much of these can be adapted to handle HPC clusters as well.
IBM acquired Platform Computing in January of 2012 and gained a rich suite of products that are targeted at the HPC community and other sectors. Building on these products, Platform has introduced various applications such as Platform Cluster Manager Advanced Edition (PCM/AE) and Platform Symphony.
PCM/AE manages multiple cluster environments and automates many of the administration functions. It offers a web portal to simplify the process of managing and requesting resources on one or many clusters. Symphony provides enterprise-class management for running distributed applications and big data analytics as well as a host of other functions.
Both of these products are a sample of tools that are available to automate and optimize the management of HPC clusters and offer cloud like functionality providing an improved and agile environment. Many of the existing HPC environments serve a multitude of clients and the need to simplify access is a priority. Another trend is to offer unused processing capacity to the private sector in order to re-coup some of the high costs involved in building HPC clusters. This type of service offering requires a robust and secure strategy to ensure data confidentiality and a minimum service level as clients are paying for the service.
With the advent of Cloud type HPC management tools, these types of services are now a possibility with much less effort than previously required.
For more information on IBM Platform products you can consult the following IBM Redbook: http://www.redbooks.ibm.com/redpieces/abstracts/sg248073.html?Open