February 11, 2019 By Randy Ruhlow
Hillary Porter
3 min read

Every year, computers get smarter and faster as they adapt to emerging technology challenges. Collaborating on these industry-leading computing solutions is an exciting frontier. As we begin a new year, IBM Systems Lab Services is wrapping up our largest contract to date on one such project — a collaborative high-performance computing (HPC) endeavor by the US Department of Energy (DOE) called CORAL.

The CORAL project includes some of the world’s smartest and most powerful computers, built on IBM Power Systems with IBM Elastic Storage Server, IBM Spectrum Scale and an IBM software stack. A large team effort by IBM Systems Lab Services was vital to its successful implementation.

What is the CORAL project?

CORAL stands for “Collaboration of Oak Ridge, Argonne and Livermore.” The project is a collaboration between the National Nuclear Security Administration’s Advanced Simulation and Computing (ASC) Program and the Office of Science’s ASC Research program that culminates in high-performance supercomputers at Oak Ridge, Argonne and Lawrence Livermore National Laboratories.

Collaboration for success

CORAL was a huge project with many moving parts, and successful delivery required leadership from technical professionals with proven expertise on IT infrastructure design and implementation. Lab Services contributed roughly 40 technical consultants who put in approximately 20,000 hours of service over the last two and a half years, starting with deploying early-access systems with IBM POWER8 to accelerate IBM POWER9 adoption.

We provided a wide range of services, including designing, planning and implementing the deployment.  Specifically, Lab Services:

  • Delivered technical project management for the Oak Ridge and Livermore CORAL systems
  • Assisted IBM Development and Manufacturing with cluster infrastructure design and build
  • Provided detailed schedules, resource plans and costs for solution deployments
  • Leveraged sub-contractors for the labor-intensive physical build-outs of racks and HPC hardware
  • Performed hardware installation and system build-out of the HPC compute and Elastic Storage Server storage cluster systems
  • Provided hardware verification, cluster management verification prior to advanced cluster testing
  • Provided assistance to IBM Development during triage efforts, deployment fixes and final acceptance support
  • Interfaced with our NVIDIA, Mellanox, Seagate and Red Hat partners
  • Worked closely with Mellanox for Infiniband and Ethernet network cabling installation design and bring-up support

These contributions were vital to the successful implementation of CORAL, and now we have the opportunity to see the world’s most powerful computers put to work using artificial intelligence (AI) for scientific research.

About the supercomputers

Summit is the HPC system at Oak Ridge National Laboratory (currently positioned as number 1 of the Top 500 most powerful commercially available computer systems today), and Sierra is the HPC system at Lawrence Livermore National Laboratory (ranked number 2).

For the tech-minded among us, these computers have 200 and 125 petaflop theoretical peaks, respectively. A petaflop is equal to a thousand trillion floating-point operations per second, and if that sounds like a ridiculous number, it is. The performance of these supercomputers is akin to having hundreds of thousands of PCs working on a problem at the same time! Not only that, but the systems take up considerably less space and are at least five times more efficient than the previous system.

Lab Services also worked on other large computers as part of the CORAL project, such as Lassen, also at Livermore (and ranked number 11 in the top 500).

What CORAL aims to achieve

The supercomputing capabilities in Summit and Sierra will help the DOE labs embrace AI and deep learning capabilities to achieve their respective missions around open scientific research and enhancing national defense. Researchers will be able to work faster and smarter — creating more complex code and producing models and simulations with greater resolution and higher fidelity to fuel their scientific research.

Key solutions and partnerships for high-performance computing

AI, deep learning and data analytics are buzzwords in tech circles today. These technologies are driving the future of business, and HPC systems are evolving rapidly to help organizations build the infrastructure to support faster insights with every workload.

CORAL’s Summit and Sierra supercomputing systems are the direct result of extended partnerships with leading technology providers. Each IBM Power Systems AC922 pairs IBM POWER9 processors with NVIDIA Tesla GPU accelerators connected with next-generation NVIDIA NVLink, a multi-channel interconnect technology that provides more bandwidth than PCIe Gen 3 and facilitates combinations of GPU and CPU inter-communications. Mellanox’s partnership has brought key advances to Infiniband high-speed connectivity to data storage through a robust implementation for adaptive routing and offloading collective operations with their Scalable Hierarchical Aggregation and Reduction Protocol (SHARP). Lastly, Red Hat’s partnership provides the enterprise Linux distribution and expertise with integrating complex software allowing the HPC compute cluster applications to leverage these technologies for accessing the file systems and data storage hosted on a complete high-density, high-performance storage solution provided by IBM Elastic Storage Server, IBM Spectrum Scale and Spectrum Scale RAID software.

Proven IT infrastructure expertise for the cognitive era

The consultants in IBM Systems Lab Services have a wealth of experience delivering a wide range of IT infrastructure solutions. Our experience designing, building and delivering IBM Systems infrastructure solutions for HPC and AI helped us to play a critical role in building the most powerful computers on the planet today.

If you’re looking for support on an upcoming HPC or AI analytics project, contact us today.

Was this article helpful?
YesNo

More from Cloud

The future of 5G: What to expect from this transformational technology

7 min read - Since its rollout in 2019, 5G wireless networks have been growing in both availability and use cases. Apple was one of the first manufacturers to test the appetite for 5G in 2020 by offering its newest iPhone with 5G compatibility. From there, the floodgates opened, and today as much as 62% of smartphones are built with 5G connectivity (link resides outside ibm.com.) The number of networks also continues to grow, with many popular Internet Service Providers (ISPs) like Verizon, Google…

Getting started with Kafka client metrics

4 min read - Apache Kafka stands as a widely recognized open source event store and stream processing platform. It has evolved into the de facto standard for data streaming, as over 80% of Fortune 500 companies use it. All major cloud providers provide managed data streaming services to meet this growing demand. One key advantage of opting for managed Kafka services is the delegation of responsibility for broker and operational metrics, allowing users to focus solely on metrics specific to applications. In this…

IBM Tech Now: March 11, 2024

< 1 min read - ​Welcome IBM Tech Now, our video web series featuring the latest and greatest news and announcements in the world of technology. Make sure you subscribe to our YouTube channel to be notified every time a new IBM Tech Now video is published. IBM Tech Now: Episode 94 On this episode, we're covering the IBM X-Force Threat Intelligence Index 2024: IBM X-Force Threat Intelligence Index 2024 landing page Download the report Watch the webinar: "Cybersecurity in 2024: Exploiting the human attack…

IBM Newsletters

Get our newsletters and topic updates that deliver the latest thought leadership and insights on emerging trends.
Subscribe now More newsletters