The scientific community depends increasingly on high-speed simulations and data processing. UK universities wanted the ability to tackle problems and data sets of unprecedented size and complexity.
The N8 Research Partnership worked with OCF and IBM to create Bede—an advanced computing facility based on IBM® Power Systems™ servers with IBM POWER9® processors and NVIDIA GPUs—housed at Durham.
Boostsperformance through memory coherence across CPU and GPU
Increasessize of problems that can be tackled via distributed GPU capability
Exceptionalexpertise and support from OCF is helping to create a sustainable facility
Business challenge story
Pushing the boundaries
High-performance computing (HPC) is an increasingly important element in driving new scientific breakthroughs. As the sensors and imaging equipment used to monitor scientific experiments become more sensitive and sophisticated, the volume of data that needs to be processed and analysed is growing rapidly. And at the same time, researchers are building ever more detailed models and simulations to complement and explain their experimental findings.
Dr Alan Real, Director of Advanced Research Computing at Durham University, says: “Across the university HPC community in the UK, we’re working together to bring computation and analysis closer to experimentation, so that data from observations and models can inform each other in support of scientific discoveries. In simple terms, the challenge is that the models and data sets no longer fit on the GPUs used for processing.”
Particularly in scientific imaging, where camera technology is following Moore’s law and doubling its resolution approximately every 18 months, tomorrow’s data pipelines threaten to overwhelm today’s accelerated HPC clusters.
In response to a funding call from the UK Engineering and Physical Sciences Research Council (EPSRC), the N8 Research Partnership (external link)—a collaboration of the eight most research-intensive Universities in the North of England: Durham, Lancaster, Leeds, Liverpool, Manchester, Newcastle, Sheffield and York—looked into scientific challenges that could be overcome with the help of a new Tier-2 HPC centre. Dr Alan Real says: “We asked ourselves: ‘What challenges do we need to overcome? Where are the capability gaps in our accelerated x86 clusters?’ And it was clear that the fundamental limitation is speed at which we can move large sets of data between CPUs and GPUs. We set out to build a new shared resource for the community that would extend the boundaries of the possible.”
The N8 Research Partnership proposed a new UK National Tier 2 HPC facility, to be hosted at Durham University, that would unleash machine-learning capabilities on the vast sets of data being generated by experiments. The new facility would also empower researchers to take on larger scientific problems by enabling memory coherence—that is, sharing memory resources—between CPUs and GPUs, and by enabling parallel processing across multiple distributed GPUs.
Unique needs call for a unique architecture
The N8 Centre of Excellence in Computationally Intensive Research (external link) (N8 CIR) was awarded GBP 3.1 million from the Engineering and Physical Sciences Research Council to establish the new Tier 2 computing facility in the north of England — hosted at Durham University and named Bede. The N8 Research Partnership is supporting all operational costs and providing dedicated engineering support at each institution for user training, code porting and optimisation.
“There was really only one architecture that met our demanding requirements: IBM Power Systems,” says Bede Director Dr Alan Real. “We needed Power Systems to get support for memory-coherence and multiple cross-node GPUs. And because this is a cutting-edge architecture, we needed the right partners to co-create the facility and keep it working: IBM and OCF. As a facilitator of the PowerAI user group and with a track record of multiple HPC implementations using IBM technology, OCF had the combination of community credibility and technical expertise we needed to make the most of the new facility.”
OCF (external link) has almost 20 years of experience in designing, building, optimising and managing HPC solutions for higher education, research and industry throughout the UK and Ireland. Although IBM Power Systems was a completely new architecture for Durham University, a pilot project conducted by the University of Liverpool with IBM and OCF gave the N8 full confidence in choosing the technology.
“The Bede facility is not just cutting-edge IBM hardware and software; it’s a community capability driven by common challenges and purposes,” comments Dr Alan Real. “OCF is helping us ensure that our research community can get the best out of the solution — that facilitation aspect is where OCF really excels. By bringing together IBM, OCF and the research community around this new facility, we’ve created a collaborative space to enable next-generation science.”
Deployed during the 2020 lockdown, the Bede facility (external link) comprises 34 IBM Power Systems AC922 servers complemented by four IBM Power Systems IC922 servers to support inference workloads for machine learning. Bede runs Red Hat Enterprise Linux and IBM Watson® Machine Learning Accelerator, an environment for data-science-as-a-service making it faster and easier to bring AI applications, deep learning and machine learning within the grasp of non-specialist users.
“Bede does a great job of abstracting away the complexity of environment features with a fast-to-market approach that allows non-specialist users to ‘pick up and play’,” says Chris Coates, OCF Technological Innovation Lead. “Equally, it enables power-users to get the fullest benefits from the specifics of the IBM environment. And with the ability to expand out depending on the way the market moves, we’re providing great performance and flexibility both now and in the future. HPC systems today need to be highly flexible—our software-stack approach aims to address that new-world demand, whilst being backed up by the years of experience that OCF provides. We believe this puts us in the ideal position to deliver the most unique systems.”
The new Bede facility will give researchers across multiple domains and disciplines powerful tools for research, analysis, simulation, modelling, and inference. The facility is already being used for pandemic modelling, and future applications will include drug design, advanced material analysis, molecular modelling, and whole-cell simulation. Bede will also help broaden the application of HPC to broader use cases outside of the traditional areas of cosmology, engineering, and physical sciences—for example, in linguistics and in the humanities. By bringing together the worlds of experimentation, modelling and machine learning, Bede will help researchers to better understand vast scientific data sets, making new scientific breakthroughs more likely.
“IBM Power Systems enables us to deal with data at a scale that other machines can’t for analysis and accelerated simulations,” says Dr Alan Real. “It’s not just far faster, but it enables us to tackle problems that were simply beyond our capabilities before, thanks to the CPU-GPU memory coherence and distributed-GPU computations with IBM’s Distributed Deep Learning software.”
He adds: “Everybody has something in their roadmap about memory coherence. Only IBM lets us have that today in the Power Systems architecture with NVIDIA: we’re giving our research communities a preview of the future!”
Bede is on the same network as Durham University’s DiRAC memory-intensive machine, and has a high-capacity connection to another national HPC facility in Cambridge. The future vision is that Bede will be a hub supporting workflows across multiple tiers and sites of computing, to enable next-generation experiments.
Bede is not just about providing the greatest possible performance to seasoned users of HPC. The N8 also want to open HPC to new users in emerging computationally intensive domains such as digital health and digital humanities. This makes it vital to provide a comprehensive and intuitive software stack that offers easy access.
“In the end, HPC services are judged by the quality and impact of the research that they facilitate, and usability is key,” concludes Dr Alan Real. “We’re working closely with IBM and OCF, drawing on their expertise and experience to hide some of the complexity and subtleties of this unique architecture—but at the same time make sure that advanced users can access the full capabilities. The strength of this three-way partnership is critical to the ongoing success and sustainability of Bede as a world-class computational resource for our research community.”
“The success of any deployment is how well it is adopted by users,” says Chris Coates. “We have worked closely with our partners IBM to get the technical details right so that the platform is sound, whilst also working very hard with Durham University, the N8, IBM and NVIDIA to ensure that the adoption piece here was as smooth a process as possible. The Hackathon we ran on Bede as part of our three-way partnership, also involving NVIDIA, has helped deliver significant adoption and improvements in code. It is also a great example of how OCF is going the extra mile to offer next-level support and ensure platforms are adopted to their best potential by a growing and diversifying community of researchers.”
Durham University (external link), ranked one of the top 100 universities in the QS World University Rankings 2021 (external link), is a collegiate university in the North East of England with more than 18,000 students.
The N8 Research Partnership (external link) is a collaboration of the eight most research-intensive Universities in the North of England: Durham, Lancaster, Leeds, Liverpool, Manchester, Newcastle, Sheffield and York. N8 aims to maximise the impact of this research base by promoting collaboration, establishing innovative research capabilities, and driving economic growth.
Take the next step
To learn more about IBM Power System offerings for high-performance computing, please contact your IBM representative or IBM Business Partner, or visit the following website: /it-infrastructure/power/power9
OCF offers bespoke solutions and services to meet the most demanding HPC, Storage, Cloud and AI workloads. Best known for its work in higher education, the company has also delivered HPC solutions to many other industries, including engineering, life sciences, motorsport, public health, defence, and oil & gas. The company offers entire lifecycle support on its solutions to ensure adoption and a quality of service to underpin any solution. To learn more about OCF, visit: ocf.co.uk