Power servers

Five important considerations for HPC deployments

Share this post:

Architecting superior high-performance computing (HPC) clusters requires a holistic approach that responds to performance at every level of the deployment.

HPC workloads put massive computational demands on your infrastructure: they involve analyzing huge data sets to solve complex problems, and that means they require supercomputing capabilities.

In short, there are many facets to the HPC cluster.  All of these facets must be installed and configured in a well-orchestrated manner before the workloads can run effectively in the cluster.

The goal for the experts of IBM Systems Lab Services is to assist clients in quickly gaining value from their HPC cluster on IBM Power Systems.   We help them plan, install and configure the HPC cluster while performing knowledge transfer.

Are you looking to deploy a high-performance computing cluster in your infrastructure? If so, there are some important elements of an HPC deployment that you should consider.

  • Network integration: Based on our experience in Lab Services, network integration is critical for successful HPC installation. To achieve that goal, we work with different teams and the clients to design a plan to configure and integrate Ethernet switches and IB switches into their environment. This is the base level other parts of the HPC cluster are built on.
  • Choice of workloads: It is important to understand the workloads that clients are going to run in the HPC cluster. This dictates which Linux distribution and software stack should be loaded onto the servers. For example, with simulation or modeling workloads, Red Hat Enterprise Linux is usually used. On the other hand, if PowerAI/deep learning/machine learning is the workload, then Ubuntu is normally used.
  • Use of a deployment tool: Given that there are multiple servers in the HPC cluster, as part of the installation, we usually install a deployment tool such as xCAT or Platform Compute Manager to deploy the required OS image and HPC software to the servers. One advantage of using a deployment tool is to help clients to deploy and update the server nodes consistently. If clients have their existing deployment tools and want to use the same tools for the HPC cluster, then we work with them to integrate the HPC cluster into their deployment tools.
  • Storage: On the storage side, if IBM Elastic Storage Server (ESS) is in the environment, it would be installed and configured. File systems are created to be mounted on the server nodes. As a best practice, we usually create a remote General Parallel File System cluster on top of the servers in the HPC cluster and remote-mount the file systems from ESS or an existing GPFS cluster at the client site.
  • Monitoring: At this point, we have the HPC cluster up and ready for running workloads. Next, we want clients to be able to monitor the resources in the HPC cluster and schedule jobs running in the cluster. IBM Spectrum LSF is a tool normally used for these tasks. Usually, it is installed in a shared file system and mounted on all of the servers.

Attending to these five aspects of an HPC deployment will help ensure that your environment is optimized to run effectively, helping you to use HPC to gain powerful insights and solve your complex business challenges.

One last step: Transferring knowledge to the client team

During the whole HPC cluster deployment process, we document the steps taken and provide this documentation to clients as part of the knowledge transfer. We want to ensure that client IT teams are self-sufficient in managing the HPC cluster and extend the cluster with additional IBM products.

Are you looking to consult with IT professionals with expertise on HPC applications? IBM Systems Lab Services can help. Contact us today.

In the meantime, learn more about IBM HPC solutions that provide an integrated platform to optimize your HPC workflows, resulting in faster time to insights and value.

More Power servers stories

AI today: Data, training and inferencing

AI, Deep learning, IBM Systems Lab Services

In my last blog, I discussed artificial intelligence, machine learning and deep learning and some of the terms used when discussing them. Today, I’ll focus on how data, training and inferencing are key aspects to those solutions. The large amounts of data available to organizations today have made possible many AI capabilities that once seemed ...read more

A future of powerful clouds

Hybrid cloud storage, Multicloud, Storage

In a very good way, the future is filled with clouds. In the realm of information technology, this statement is especially true. Already, the majority of organizations worldwide are taking advantage of more than one cloud provider.[1] IBM calls this a “hybrid multicloud” environment – “hybrid” meaning both on- and off-premises resources are involved, and ...read more

AI, machine learning and deep learning: What’s the difference?

AI, Deep learning, IBM Systems Lab Services

It’s not unusual today to see people talking about artificial intelligence (AI). It’s in the media, popular culture, advertising and more. When I was a kid in the 1980s, AI was depicted in Hollywood movies, but its real-world use was unimaginable given the state of technology at that time. While we don’t have robots or ...read more