IBM Systems Lab Services

The world’s smartest supercomputers, built on IBM Systems

Share this post:

Every year, computers get smarter and faster as they adapt to emerging technology challenges. Collaborating on these industry-leading computing solutions is an exciting frontier. As we begin a new year, IBM Systems Lab Services is wrapping up our largest contract to date on one such project — a collaborative high-performance computing (HPC) endeavor by the US Department of Energy (DOE) called CORAL.

The CORAL project includes some of the world’s smartest and most powerful computers, built on IBM Power Systems with IBM Elastic Storage Server, IBM Spectrum Scale and an IBM software stack. A large team effort by IBM Systems Lab Services was vital to its successful implementation.

What is the CORAL project?

CORAL stands for “Collaboration of Oak Ridge, Argonne and Livermore.” The project is a collaboration between the National Nuclear Security Administration’s Advanced Simulation and Computing (ASC) Program and the Office of Science’s ASC Research program that culminates in high-performance supercomputers at Oak Ridge, Argonne and Lawrence Livermore National Laboratories.

Collaboration for success

CORAL was a huge project with many moving parts, and successful delivery required leadership from technical professionals with proven expertise on IT infrastructure design and implementation. Lab Services contributed roughly 40 technical consultants who put in approximately 20,000 hours of service over the last two and a half years, starting with deploying early-access systems with IBM POWER8 to accelerate IBM POWER9 adoption.

We provided a wide range of services, including designing, planning and implementing the deployment.  Specifically, Lab Services:

  • Delivered technical project management for the Oak Ridge and Livermore CORAL systems
  • Assisted IBM Development and Manufacturing with cluster infrastructure design and build
  • Provided detailed schedules, resource plans and costs for solution deployments
  • Leveraged sub-contractors for the labor-intensive physical build-outs of racks and HPC hardware
  • Performed hardware installation and system build-out of the HPC compute and Elastic Storage Server storage cluster systems
  • Provided hardware verification, cluster management verification prior to advanced cluster testing
  • Provided assistance to IBM Development during triage efforts, deployment fixes and final acceptance support
  • Interfaced with our NVIDIA, Mellanox, Seagate and Red Hat partners
  • Worked closely with Mellanox for Infiniband and Ethernet network cabling installation design and bring-up support

These contributions were vital to the successful implementation of CORAL, and now we have the opportunity to see the world’s most powerful computers put to work using artificial intelligence (AI) for scientific research.

About the supercomputers

Summit is the HPC system at Oak Ridge National Laboratory (currently positioned as number 1 of the Top 500 most powerful commercially available computer systems today), and Sierra is the HPC system at Lawrence Livermore National Laboratory (ranked number 2).

For the tech-minded among us, these computers have 200 and 125 petaflop theoretical peaks, respectively. A petaflop is equal to a thousand trillion floating-point operations per second, and if that sounds like a ridiculous number, it is. The performance of these supercomputers is akin to having hundreds of thousands of PCs working on a problem at the same time! Not only that, but the systems take up considerably less space and are at least five times more efficient than the previous system.

Lab Services also worked on other large computers as part of the CORAL project, such as Lassen, also at Livermore (and ranked number 11 in the top 500).

What CORAL aims to achieve

The supercomputing capabilities in Summit and Sierra will help the DOE labs embrace AI and deep learning capabilities to achieve their respective missions around open scientific research and enhancing national defense. Researchers will be able to work faster and smarter — creating more complex code and producing models and simulations with greater resolution and higher fidelity to fuel their scientific research.

Key solutions and partnerships for high-performance computing

AI, deep learning and data analytics are buzzwords in tech circles today. These technologies are driving the future of business, and HPC systems are evolving rapidly to help organizations build the infrastructure to support faster insights with every workload.

CORAL’s Summit and Sierra supercomputing systems are the direct result of extended partnerships with leading technology providers. Each IBM Power Systems AC922 pairs IBM POWER9 processors with NVIDIA Tesla GPU accelerators connected with next-generation NVIDIA NVLink, a multi-channel interconnect technology that provides more bandwidth than PCIe Gen 3 and facilitates combinations of GPU and CPU inter-communications. Mellanox’s partnership has brought key advances to Infiniband high-speed connectivity to data storage through a robust implementation for adaptive routing and offloading collective operations with their Scalable Hierarchical Aggregation and Reduction Protocol (SHARP). Lastly, Red Hat’s partnership provides the enterprise Linux distribution and expertise with integrating complex software allowing the HPC compute cluster applications to leverage these technologies for accessing the file systems and data storage hosted on a complete high-density, high-performance storage solution provided by IBM Elastic Storage Server, IBM Spectrum Scale and Spectrum Scale RAID software.

Proven IT infrastructure expertise for the cognitive era

The consultants in IBM Systems Lab Services have a wealth of experience delivering a wide range of IT infrastructure solutions. Our experience designing, building and delivering IBM Systems infrastructure solutions for HPC and AI helped us to play a critical role in building the most powerful computers on the planet today.

If you’re looking for support on an upcoming HPC or AI analytics project, contact us today.

More IBM Systems Lab Services stories

Top IBM Power Systems myths: The OpenPOWER Foundation is not really an industry backed consortium

IBM Systems Lab Services, OpenPOWER, Power servers...

There are many misconceptions about IBM Power Systems in the marketplace today, and this blog series is helping to dispel some of the top myths. In my last post, I put aside the myth that the x86 architecture is the de-facto industry standard for all applications and that Power Systems will soon become obsolete. In ...read more


The rise of Open Source Databases

IBM Systems Lab Services, Linux on Power Systems, Open source...

After many years of working in the IT industry, both as an IT manager in a large telecommunications setup and as a consultant providing solutions to my clients, I’ve come to see a huge interest among users in leveraging more open source software and standards. It comes as no surprise to me that the adoption ...read more


Automate disaster recovery using IBM VM Recovery Manager

IBM Systems Lab Services, Power servers, Power Systems

Business continuity is a top priority for every enterprise. And, at the foundation, it’s all about having a solid plan in place to deal with disruptions and potential threats. If you’re an IT planner, you know that data protection and disaster recovery (DR) — the aspects of business continuity that are most relevant to IT ...read more