February 11, 2019 By Randy Ruhlow
Hillary Porter
3 min read

Every year, computers get smarter and faster as they adapt to emerging technology challenges. Collaborating on these industry-leading computing solutions is an exciting frontier. As we begin a new year, IBM Systems Lab Services is wrapping up our largest contract to date on one such project — a collaborative high-performance computing (HPC) endeavor by the US Department of Energy (DOE) called CORAL.

The CORAL project includes some of the world’s smartest and most powerful computers, built on IBM Power Systems with IBM Elastic Storage Server, IBM Spectrum Scale and an IBM software stack. A large team effort by IBM Systems Lab Services was vital to its successful implementation.

What is the CORAL project?

CORAL stands for “Collaboration of Oak Ridge, Argonne and Livermore.” The project is a collaboration between the National Nuclear Security Administration’s Advanced Simulation and Computing (ASC) Program and the Office of Science’s ASC Research program that culminates in high-performance supercomputers at Oak Ridge, Argonne and Lawrence Livermore National Laboratories.

Collaboration for success

CORAL was a huge project with many moving parts, and successful delivery required leadership from technical professionals with proven expertise on IT infrastructure design and implementation. Lab Services contributed roughly 40 technical consultants who put in approximately 20,000 hours of service over the last two and a half years, starting with deploying early-access systems with IBM POWER8 to accelerate IBM POWER9 adoption.

We provided a wide range of services, including designing, planning and implementing the deployment.  Specifically, Lab Services:

  • Delivered technical project management for the Oak Ridge and Livermore CORAL systems
  • Assisted IBM Development and Manufacturing with cluster infrastructure design and build
  • Provided detailed schedules, resource plans and costs for solution deployments
  • Leveraged sub-contractors for the labor-intensive physical build-outs of racks and HPC hardware
  • Performed hardware installation and system build-out of the HPC compute and Elastic Storage Server storage cluster systems
  • Provided hardware verification, cluster management verification prior to advanced cluster testing
  • Provided assistance to IBM Development during triage efforts, deployment fixes and final acceptance support
  • Interfaced with our NVIDIA, Mellanox, Seagate and Red Hat partners
  • Worked closely with Mellanox for Infiniband and Ethernet network cabling installation design and bring-up support

These contributions were vital to the successful implementation of CORAL, and now we have the opportunity to see the world’s most powerful computers put to work using artificial intelligence (AI) for scientific research.

About the supercomputers

Summit is the HPC system at Oak Ridge National Laboratory (currently positioned as number 1 of the Top 500 most powerful commercially available computer systems today), and Sierra is the HPC system at Lawrence Livermore National Laboratory (ranked number 2).

For the tech-minded among us, these computers have 200 and 125 petaflop theoretical peaks, respectively. A petaflop is equal to a thousand trillion floating-point operations per second, and if that sounds like a ridiculous number, it is. The performance of these supercomputers is akin to having hundreds of thousands of PCs working on a problem at the same time! Not only that, but the systems take up considerably less space and are at least five times more efficient than the previous system.

Lab Services also worked on other large computers as part of the CORAL project, such as Lassen, also at Livermore (and ranked number 11 in the top 500).

What CORAL aims to achieve

The supercomputing capabilities in Summit and Sierra will help the DOE labs embrace AI and deep learning capabilities to achieve their respective missions around open scientific research and enhancing national defense. Researchers will be able to work faster and smarter — creating more complex code and producing models and simulations with greater resolution and higher fidelity to fuel their scientific research.

Key solutions and partnerships for high-performance computing

AI, deep learning and data analytics are buzzwords in tech circles today. These technologies are driving the future of business, and HPC systems are evolving rapidly to help organizations build the infrastructure to support faster insights with every workload.

CORAL’s Summit and Sierra supercomputing systems are the direct result of extended partnerships with leading technology providers. Each IBM Power Systems AC922 pairs IBM POWER9 processors with NVIDIA Tesla GPU accelerators connected with next-generation NVIDIA NVLink, a multi-channel interconnect technology that provides more bandwidth than PCIe Gen 3 and facilitates combinations of GPU and CPU inter-communications. Mellanox’s partnership has brought key advances to Infiniband high-speed connectivity to data storage through a robust implementation for adaptive routing and offloading collective operations with their Scalable Hierarchical Aggregation and Reduction Protocol (SHARP). Lastly, Red Hat’s partnership provides the enterprise Linux distribution and expertise with integrating complex software allowing the HPC compute cluster applications to leverage these technologies for accessing the file systems and data storage hosted on a complete high-density, high-performance storage solution provided by IBM Elastic Storage Server, IBM Spectrum Scale and Spectrum Scale RAID software.

Proven IT infrastructure expertise for the cognitive era

The consultants in IBM Systems Lab Services have a wealth of experience delivering a wide range of IT infrastructure solutions. Our experience designing, building and delivering IBM Systems infrastructure solutions for HPC and AI helped us to play a critical role in building the most powerful computers on the planet today.

If you’re looking for support on an upcoming HPC or AI analytics project, contact us today.

Was this article helpful?
YesNo

More from Cloud

Think inside the box: Container use cases, examples and applications

5 min read - Container management has come a long way. For decades, managing containerized environments was a relatively simple affair. The modern idea of a computer container originally appeared back in the 1970s, with the concept first being used to help define application code on Unix systems. Modern containerization technology has moved on steadily from those early beginnings, and when companies run containers now, they’re getting a lot more utility for their investment. From small startups to large, established businesses, container frameworks have…

IBM Tech Now: February 26, 2024

< 1 min read - ​Welcome IBM Tech Now, our video web series featuring the latest and greatest news and announcements in the world of technology. Make sure you subscribe to our YouTube channel to be notified every time a new IBM Tech Now video is published. IBM Tech Now: Episode 92 On this episode, we're covering the following topics: IBM watsonx Orders EDGE3 + watsonx G2 Best of Software Awards Stay plugged in You can check out the IBM Blog Announcements for a full…

IBM Cloud delivers enterprise sovereign cloud capabilities

5 min read - As we see enterprises increasingly face geographic requirements around sovereignty, IBM Cloud® is committed to helping clients navigate beyond the complexity so they can drive true transformation with innovative hybrid cloud technologies. We believe this is particularly important with the rise of generative AI. While AI can undoubtedly offer a competitive edge to organizations that effectively leverage its capabilities, we have seen unique concerns from industry to industry and region to region that must be considered—particularly around data. We strongly…

IBM Newsletters

Get our newsletters and topic updates that deliver the latest thought leadership and insights on emerging trends.
Subscribe now More newsletters