March 14, 2012 | Written by: Ramón García Barberá
Share this post:
Today, we are used to speaking about desktop or test clouds, or maybe moving our email or CRM to cloud-based solutions. What do all these workloads have in common? They have some of the main characteristics to be a good candidate for cloud computing, as pointed out by Luis Aguilar in his post Migration to cloud: It is all about workloads. But, beyond that, they all are related to enterprise workloads with low or non-predictable demand for computing resources, do not generate a significant amount of data during application run, and are loosely coupled to the infrastructure.
But what if we need to deal with a workload that looks like exactly the opposite?
A workload that seems to be a more predictable, demanding a big bunch of resources, with a utilization average rate in the 80 to 90 percent range, and uses or creates huge amounts of data during application run?
Well, at first sight it could seem that we are facing a workload not suitable for cloud computing. And, we can find all those characteristics in most high performance computing (HPC) workloads.
Does it mean that HPC cannot be approached using a cloud computing model?
No. It rather means it should be approached with some differences, some specific considerations.
If we look at the evolution of HPC delivery, putting it really simply, it went from a single system to an HPC cluster, and then to an HPC grid. Also, during the first conversations about cloud computing, there were a lot of comparisons between it and the concept “grid computing,” many of them because of cloud’s characteristic of having a big or heterogeneous pool of automated resources that would provide huge scalability. Now, we already know that cloud computing means much more: on-demand self-service, standardization, and usage-based chargeback/billing. It seems cloud computing represents a perfect evolution of grid computing for HPC, and it could help address the ever increasing flexibility and capacity demands of deep scientific, technical and analytical tasks.
So, what would be the differences with a “general-purpose cloud?”
First of all, you might be thinking “what do you mean by a ‘general-purpose cloud’?”
A cloud comprised of general-purpose servers, using virtualization and general-purpose cloud management software to enable better flexibility and higher utilization, with an aim to hit the price/performance sweet spot.
Although one of these clouds could accommodate some of the so-called , in some cases they won’t meet the special needs of HPC workloads.
Typically, HPC clusters are designed to use the full compute potential of installed hardware, without the overhead introduced by the hypervisor, and most HPC applications require network-accessible storage. In other cases, additional or different elements are introduced, such as low-latency networks, Cell processors, GPUs, or even FPGAs; to boost the performance.
With that in mind, among the specific characteristics of HPC clouds, I would consider:
- Bare metal provisioning, which is the ability to provision physical machines (not only virtual machines, VMs)
- Higher vertical scalability
- High-performance software stacks, ready to use
- Network/clustered file systems support
- Accelerated clusters (Cell processors, GPUs, FPGAs, and others)
- Scale out to public clouds for certain workloads (embarrassingly parallel)
And what about the benefits?
The benefits of an HPC cloud compared to traditional HPC deployments would be those of cloud computing compared to traditional IT:
- Shared pool of resources: Ability to easily repurpose all in-house nodes
- Ease of manageability and access to HPC infrastructure through a self-service web portal
- Centralized user management, usage metering and accounting
- Single-point submission and monitoring for multiple job queues
- Greater user satisfaction: Reduced provisioning times
- Better accommodation of load peaks by moving temporarily some workloads to the public cloud.
- Potential ability to become an HPC cloud service provider and sell unused compute power during load valleys
And what do you think? What other differences do you see in HPC clouds?