Big Data Analytics

Designing new materials with data-centric systems

Share this post:

For decades, researchers have used high performance computing (HPC) to simulate systems at ever-growing speeds and scales. Recently, the design of HPC systems has started to evolve to handle and exploit the vast amounts of data now produced by both models and real-world data-sources, a paradigm IBM calls Data-Centric Computing (DCS). DCS provides a flexible computing architecture focused on co-locating computation and data. This aim is to reduce data movement thus allowing us to apply complex multi-stage analysis processes to petabytes of data at unprecedented speeds. I believe the DCS paradigm will allow scientists and industry to effectively use computers to accelerate the discovery, design and testing of new materials.

I am a research engineer at IBM with a background in computational biophysics and biochemistry, HPC and application design.  I’m a member of the Data Centric Computing and Cloud Systems team at IBM Dublin Research lab, and for the last four years have worked with the Hartree Centre, IBM’s collaboration with the UK’s Science and Technology Facilities Council (STFC), on a range of projects. This has included the Square Kilometre Array (SKA) Telescope, aimed at studying the universe in unprecedented detail, and Oasis — a tool for estimating the cost of catastrophes.  My main focus, however, is on “computational materials design” and,  since 2015, I’ve co-led a team of IBM and STFC Hartree Centre researchers on this topic.

Computational materials design replicates on a computer the type of experiments chemists in white coats perform in a lab. For example, seeing what happens when various chemicals are mixed together in water.  Our aim is to help industry harness the potential of computers for designing new molecules and chemical mixtures (or formulations). Formulations are all around us and range from house-hold items like laundry detergent to the latest fuel-additives for improving engine performance. By using computers, companies can save expensive and time-consuming lab experiments and chemists can explore many more design possibilities than would be feasible in a lab.

All this relies heavily on harnessing IBM technology. We use GPUs, OpenPower and other elements of IBM data-centric architecture to accelerate the simulations and analysis. We also engage with cognitive and data-analytics experts in IBM to develop better analysis technique for our data, and to improve the materials design process itself. A concrete example of the latter is using cognitive algorithms to guide what computational experiment to run.

When designing the projects we work on we are guided by the needs of our industrial clients. We hold workshops with clients to identify common problems.

For example, we’ve worked closely with Unilever, a leader in food, drink and soap products, to accelerate their product discovery process. We’ve developed a number of “computational appliances,” black boxes that allow experimental scientists to run versions of their real-world experiments on supercomputers via a simple mobile interface and without knowledge of HPC. The aim is to help them pre-screen ingredients so that they can focus on fewer and better experiments when designing a new product. We also worked with Unilever domain specialists to advance the science behind their systems. This work resulted in a joint IBM/STFC/Unilever publication in the Journal of Physical Chemistry earlier this year.

Our work is not just outward facing, but also feeds back to IBM. Part of the mission of the Hartree Centre is to help drive development of data-centric computing technology by learning from real-world use cases. So as we use these new technologies, we provide feedback to the technology designers, giving them insight into what works well or what new features could make it easier for us to achieve our aims. This process will result in next-generations systems optimized to tackle a host of future industry and academic problems.


A proof of concept computational appliance being accessed from mobile device:

This appliance examines mixtures made of different proportions of a selected molecule with methanol and water and determines if the molecule dissolves or separates out. The result is shown on a ternary phase diagram.

Pic 1: Choosing the data-centric system to run the experiment on.
Pic 2: Choosing the molecule to test and the type of experiment.
Pic 3: An example result. The experiment has determined that mixtures below the red line will separate.

Related Links:

Hartree Insights Blog: “Introducing the Hartree Centre Chemistry and Materials Programme”

IBM Research Blog: “IBM and Hartree Centre collaboration makes significant progress in first year”

Unilever Case Study: “Accelerating the product discovery process at Unilever”

“Toward a Standard Protocol for Micelle Simulation” published in Journal of Physical Chemistry

More stories

Gauteng Province Launches COVID-19 Dashboard Developed by IBM Research, Wits University and GCRO – Now Open to the Public

The Gauteng Province has been using data and cloud technologies to monitor and respond to Covid-19, and now they are sharing access with the public. As of 20 August the Gauteng Province in South Africa has 33% of the national cases for COVID-19 with 202,000 confirmed cases — and the numbers continue to rise. To address […]

Continue reading

Largest Dataset for Document Layout Analysis Used to Ingest COVID-19 Data

Documents in Portable Document Format (PDF) are ubiquitous with over 2.5 trillion available from insurance documents to medical files to peer-review scientific articles. It represents one of the main sources of knowledge both online and offline. For example, just recently The White House made available the COVID-19 Open Research Dataset, which included links to 45,826 papers […]

Continue reading

Dataset Lifecycle Framework: the swiss army knife for data source management in Kubernetes

Hybrid Cloud is rapidly becoming the go-to IT strategy for organizations seeking the perfect mix of scalability, performance and security. As a result, it is now common for an organization to rely on a mix of on-premise and cloud solutions, or “data-sources”, from different providers to store and manage their data. It doesn’t really sound […]

Continue reading